How to Pretty Print XML from Java?

Use XSL Tranformation to Pretty Print and XPath to remove indentation

“Time is a drug. Too much of it kills you.” ― Terry Pratchett, Small Gods

1. Introduction

XML is easiest to understand when it is properly indented to indicate the element hierarchy. However the extra space added to XML when it is transported increases file sizes. So it makes sense to remove all extraneous information from the document, including ignorable white space. This article will show you how to indent and pretty print an XML document.

2. Parsing XML

Parse the XML using the DOM API and obtain the Document object as follows:

DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
DocumentBuilder builder = factory.newDocumentBuilder();
Document document = builder.parse(new File(xmlFile));

3. Printing XML

Once the Document is obtained, you can transform it using the Transformer with suitable settings to pretty print the XML.

Transformer tform = TransformerFactory.newInstance().newTransformer();
tform.setOutputProperty(OutputKeys.INDENT, "yes");
tform.setOutputProperty("{http://xml.apache.org/xslt}indent-amount", "2");
tform.transform(new DOMSource(document),new StreamResult(System.out));

And that should do the trick. Check the before and after outputs below.

<?xml version="1.0" encoding="utf-8"?><airports><airport><name>Huntsville International</name><shortcode>HSV</shortcode><city>Huntsville</city><state>AL</state><latitude>34.6486</latitude><longitude>-86.7751</longitude><utc>-6</utc><dst>True</dst><precheck>true</precheck><checkpoints><checkpoint><id>1</id><longname>HSV-A</longname><shortname>HSV-A</shortname></checkpoint></checkpoints></airport><airport><name>Mobile Regional</name><shortcode>MOB</shortcode><city>Mobile</city><state>AL</state><latitude>30.6813</latitude><longitude>-88.2443</longitude><utc>-6</utc><dst>True</dst><precheck>true</precheck><checkpoints><checkpoint><id>1</id>...

After the transformation, we have:

<?xml version="1.0" encoding="utf-8" standalone="no"?>
<airports>
  <airport>
    <name>Huntsville International</name>
    <shortcode>HSV</shortcode>
    <city>Huntsville</city>
    <state>AL</state>
    <latitude>34.6486</latitude>
    <longitude>-86.7751</longitude>
    <utc>-6</utc>
    <dst>True</dst>
    <precheck>true</precheck>
    <checkpoints>
      <checkpoint>
        <id>1</id>
        <longname>HSV-A</longname>
        <shortname>HSV-A</shortname>
      </checkpoint>
    </checkpoints>
  </airport>
  <airport>
    <name>Mobile Regional</name>
    <shortcode>MOB</shortcode>
    <city>Mobile</city>
    <state>AL</state>
    <latitude>30.6813</latitude>
    <longitude>-88.2443</longitude>
    <utc>-6</utc>
    <dst>True</dst>
    <precheck>true</precheck>
    <checkpoints>
      <checkpoint>
        <id>1</id>
...

Makes a lot of difference in understanding the structure, doesn’t it?

4. Removing Indentation from XML

Adding indentation to compact XML is all fine and easy, but is there a way to remove unwanted indentation from XML? There is, but not as straightforward as above. It requires the use of XPath to identify nodes to remove.

4.1. Initialize XPath

Create an XPath object as follows:

XPathFactory xfact = XPathFactory.newInstance();
XPath xpath = xfact.newXPath();

4.2 Find and Remove Empty Text Nodes

Identify the text nodes that can be ignored safely as follows:

NodeList empty =
    (NodeList)xpath.evaluate("//text()[normalize-space(.) = '']",
                             document, XPathConstants.NODESET);

And remove the text nodes:

for (int i = 0; i < empty.getLength(); i++) {
    Node node = empty.item(i);
    node.getParentNode().removeChild(node);
}

And export the document as compact XML.

tform.transform(new DOMSource(document),new StreamResult(System.out));

Let us try it on this sample XML:

<?xml version="1.0"?>
<catalog>
   <book id="bk101">
      <author>Gambardella, Matthew</author>
      <title>XML Developer's Guide</title>
      <genre>Computer</genre>
      <price>44.95</price>
      <publish_date>2000-10-01</publish_date>
      <description>An in-depth look at creating applications
      with XML.</description>
   </book>
   <book id="bk102">
      <author>Ralls, Kim</author>
      <title>Midnight Rain</title>
      <genre>Fantasy</genre>
      <price>5.95</price>
...

On applying the above transformation, we get:

<?xml version="1.0" encoding="UTF-8" standalone="no"?><catalog><book id="bk101"><author>Gambardella, Matthew</author><title>XML Developer's Guide</title><genre>Computer</genre><price>44.95</price><publish_date>2000-10-01</publish_date><description>An in-depth look at creating applications
      with XML.</description></book><book id="bk102"><author>Ralls, Kim</author><title>Midnight Rain</title><genre>Fantasy</genre><price>5.95</price>
...

Note that white space within the above XML which is NOT classified as ignorable is not removed.

Conclusion

Pretty printing XML is quite useful when one needs to understand the structure. It is quite easy to pretty print XML. Also useful in certain situations is removing ignorable whitespace within XML documents. This is a bit more involved in that XPath is required, but is not very complicated.

Leave a Reply

Your email address will not be published. Required fields are marked *