How to Search XML Using XQuery from Java

Use XQuery for more powerful XML search than is possible with XPath

“I guess there are never enough books.” ― John Steinbeck, A John Steinbeck Encyclopedia

1. Introduction

XPath offers an easy way to search an XML document using java. It is much more convenient than having to crawl through the document tree using the DOM API. XQuery builds on XPath and provides an SQL-like language for querying XML documents. It is also capable of updating and modifying the XML document, something which XPath cannot do. In this article, we present a beginner’s introduction to XQuery and how to use it from java.

Continue reading “How to Search XML Using XQuery from Java”

How to Install and Use Oracle XQuery Processor for Java

Use XQuery to Query, Update and Modify XML Documents

“Trees that are slow to grow bear the best fruit.” ― Molière

1. Introduction

As a Java Programmer, if you work in any way with XML, you should learn about XQuery and how to use it.

XQuery is a query language for XML. It is similar to XPath in that it uses the same or similar constructs to identify specific parts of an XML document.

Continue reading “How to Install and Use Oracle XQuery Processor for Java”

How to Pretty Print XML from Java?

Use XSL Tranformation to Pretty Print and XPath to remove indentation

“Time is a drug. Too much of it kills you.” ― Terry Pratchett, Small Gods

1. Introduction

XML is easiest to understand when it is properly indented to indicate the element hierarchy. However the extra space added to XML when it is transported increases file sizes. So it makes sense to remove all extraneous information from the document, including ignorable white space. This article will show you how to indent and pretty print an XML document.

Continue reading “How to Pretty Print XML from Java?”

Load XML into Mysql Using Java

Load XML into MySQL by using Java DOM Parser.

“Never memorize something that you can look up.”
― Albert Einstein

1. Introduction

XML provides the ability to represent hierarchical structure with its parent-child relationships. This enables applications to store structured data in XML for export. Importing this XML data into a database is a bit involved as we shall see in this article. You need to write code to manage the database connection. In addition you need parse the XML and isolate the data that needs to be imported.

Continue reading “Load XML into Mysql Using Java”

How to Modify XML File in Java

1. Introduction

Let us learn how to modify an XML file to remove unwanted information.

One method to remove XML nodes is to use the XML DOM Api to search the XML structure and remove unwanted nodes. While this sounds easy, using the DOM Api is quite hard especially for anything more than trivial searches as this article demonstrates.

An easier method to navigate and remove unwanted Nodes is to use XPath. Even complex search and removal is quite easy as we shall see.

See this article for details on parsing an XML file to obtain the XML Document.

2. Using removeChild() to remove Nodes

Once a particular node is identified for removal, it can be removed quite easily by invoking removeChild() on the parent Node.

static private void removeNode(Node node)
{
  Node parent = node.getParentNode();
  if ( parent != null ) parent.removeChild(node);
}

2.1 Saving the Modified XML Document

After the required modifications are done, the XML Document can be saved by using a Transformer.

Initialize the Transformer as shown:

tform = TransformerFactory.newInstance().newTransformer();
tform.setOutputProperty(OutputKeys.INDENT, "yes");
tform.setOutputProperty("{http://xml.apache.org/xslt}indent-amount", "2");

Save the modified XML document quite easily using the transformer instance.

tform.transform(new DOMSource(document), new StreamResult(System.out));

3. Searching the XML Document

The XML data set we are using is the TSA airport and checkpoint data available here. We would like to search this data set for the airport in Mobile, AL (identified as <shortcode>MOB</shortcode> in the data set). The following code checks each node whether it matches the query.

static private boolean testNode(Node node)
{
    NodeList nlist = node.getChildNodes();
    for (int i = 0 ; i < nlist.getLength() ; i++) {
	Node n = nlist.item(i);
	String name = n.getLocalName();
	if ( name != null && name.equals("shortcode") ) {
	    return n.getTextContent().equals("MOB");
	}
    }
    return false;
}

Collect the nodes to be removed by searching from the document root.

List<Node> nodes = new ArrayList<>();
NodeList nlist = document.getFirstChild().getChildNodes();
for (int i = 0 ; i < nlist.getLength() ; i++) {
    Node node = nlist.item(i);
    if ( testNode(node) ) nodes.add(node);
}

As you can see from the implementation of testNode(), complex XML search is hard using just the DOM API.

4. Using XPath to Find and Remove Nodes

XPath can be used to easily query for nodes within an XML document.

An initial setup process is required for using XPath to search.

XPathFactory xfact = XPathFactory.newInstance();
XPath xpath = xfact.newXPath();

Here is a method to query for nodes and remove them from the document.

static private void queryRemoveNodes(String xpathStr)
{
Object res = xpath.evaluate(xpathStr, document, PathConstants.NODESET);
NodeList nlist = (NodeList)res;
for (int i = 0 ; i < nlist.getLength() ; i++) {
    Node node = nlist.item(i);
    Node parent = node.getParentNode();
    if ( parent != null ) parent.removeChild(node);
}
}

The previous example to remove the airport for Mobile, AL is written as:

queryRemoveNode("/airports/airport[shortcode = 'MOB']");

The removed node is:

<airport>
  <name>Mobile Regional</name>
  <shortcode>MOB</shortcode>
  <city>Mobile</city>
  <state>AL</state>
  <latitude>30.6813</latitude>
  <longitude>-88.2443</longitude>
  <utc>-6</utc>
  <dst>True</dst>
  <precheck>true</precheck>
  <checkpoints>
    <checkpoint>
      <id>1</id>
      <longname>MOB-A</longname>
      <shortname>MOB-A</shortname>
    </checkpoint>
  </checkpoints>
</airport>

Furthermore, to remove just the <checkpoints> element from the above node, use the following:

queryRemoveNodes("/airports/airport[shortcode = "MOB"]/checkpoints");

Easily remove a bunch of nodes matching an expression.

queryRemoveNodes("/airports/airport[latitude < 20]");

Summary

There are two ways of removing nodes from an XML document. The direct method is to search for nodes using the DOM Api and remove them. An easier way is to use XPath to query and remove the nodes matching even complex queries.