How to Modify XML File in Java

1. Introduction

Let us learn how to modify an XML file to remove unwanted information.

One method to remove XML nodes is to use the XML DOM Api to search the XML structure and remove unwanted nodes. While this sounds easy, using the DOM Api is quite hard especially for anything more than trivial searches as this article demonstrates.

An easier method to navigate and remove unwanted Nodes is to use XPath. Even complex search and removal is quite easy as we shall see.

See this article for details on parsing an XML file to obtain the XML Document.

2. Using removeChild() to remove Nodes

Once a particular node is identified for removal, it can be removed quite easily by invoking removeChild() on the parent Node.

static private void removeNode(Node node)
{
  Node parent = node.getParentNode();
  if ( parent != null ) parent.removeChild(node);
}

2.1 Saving the Modified XML Document

After the required modifications are done, the XML Document can be saved by using a Transformer.

Initialize the Transformer as shown:

tform = TransformerFactory.newInstance().newTransformer();
tform.setOutputProperty(OutputKeys.INDENT, "yes");
tform.setOutputProperty("{http://xml.apache.org/xslt}indent-amount", "2");

Save the modified XML document quite easily using the transformer instance.

tform.transform(new DOMSource(document), new StreamResult(System.out));

3. Searching the XML Document

The XML data set we are using is the TSA airport and checkpoint data available here. We would like to search this data set for the airport in Mobile, AL (identified as <shortcode>MOB</shortcode> in the data set). The following code checks each node whether it matches the query.

static private boolean testNode(Node node)
{
    NodeList nlist = node.getChildNodes();
    for (int i = 0 ; i < nlist.getLength() ; i++) {
	Node n = nlist.item(i);
	String name = n.getLocalName();
	if ( name != null && name.equals("shortcode") ) {
	    return n.getTextContent().equals("MOB");
	}
    }
    return false;
}

Collect the nodes to be removed by searching from the document root.

List<Node> nodes = new ArrayList<>();
NodeList nlist = document.getFirstChild().getChildNodes();
for (int i = 0 ; i < nlist.getLength() ; i++) {
    Node node = nlist.item(i);
    if ( testNode(node) ) nodes.add(node);
}

As you can see from the implementation of testNode(), complex XML search is hard using just the DOM API.

4. Using XPath to Find and Remove Nodes

XPath can be used to easily query for nodes within an XML document.

An initial setup process is required for using XPath to search.

XPathFactory xfact = XPathFactory.newInstance();
XPath xpath = xfact.newXPath();

Here is a method to query for nodes and remove them from the document.

static private void queryRemoveNodes(String xpathStr)
{
Object res = xpath.evaluate(xpathStr, document, PathConstants.NODESET);
NodeList nlist = (NodeList)res;
for (int i = 0 ; i < nlist.getLength() ; i++) {
    Node node = nlist.item(i);
    Node parent = node.getParentNode();
    if ( parent != null ) parent.removeChild(node);
}
}

The previous example to remove the airport for Mobile, AL is written as:

queryRemoveNode("/airports/airport[shortcode = 'MOB']");

The removed node is:

<airport>
  <name>Mobile Regional</name>
  <shortcode>MOB</shortcode>
  <city>Mobile</city>
  <state>AL</state>
  <latitude>30.6813</latitude>
  <longitude>-88.2443</longitude>
  <utc>-6</utc>
  <dst>True</dst>
  <precheck>true</precheck>
  <checkpoints>
    <checkpoint>
      <id>1</id>
      <longname>MOB-A</longname>
      <shortname>MOB-A</shortname>
    </checkpoint>
  </checkpoints>
</airport>

Furthermore, to remove just the <checkpoints> element from the above node, use the following:

queryRemoveNodes("/airports/airport[shortcode = "MOB"]/checkpoints");

Easily remove a bunch of nodes matching an expression.

queryRemoveNodes("/airports/airport[latitude < 20]");

Summary

There are two ways of removing nodes from an XML document. The direct method is to search for nodes using the DOM Api and remove them. An easier way is to use XPath to query and remove the nodes matching even complex queries.