How to Parse XML in Java using a DOM Parser

1. Introduction

Let us learn how parse XML in Java using a DOM parser.

DOM stands for Document Object Model. Parsing an XML file using a DOM Parser refers to obtaining a DOM Object from the XML data. The DOM object can then be queries for various XML artifacts like elements, attributes, text nodes, etc.

2. XML Parsing in Java

Java provides three different methods for parsing XML.

  1. DOM Parsing: Parsing XML for DOM refers to obtaining a tree of XML nodes which can then be queried for required information. Here the entire XML tree is loaded into memory and used for queries and updates.
  2. SAX Parsing: Simple API for XML (SAX) is an event-handler kind of parsing where-in the parser fires events on a registered event handler when it encounters XML nodes and attributes. The event handler can then respond to the events and extract whatever information is needed.
  3. StAX: Streaming API for XML is pull-type parser for XML where-in the application “pulls” the information from the XML as needed and acts on this information. This is in contrast to the SAX parsing which can be viewed as a push-type parser.

3. Creating the XML DOM Parser

An XML parser needs to be created to parse XML. It is done as follows.

Create a DocumentBuilderFactory and configure it to your requirements as shown. If you need namespace processing, turn on namespace awareness with setNamespaceAware(true).

To perform DTD validation on the XML document, turn on validation using setValidating(true).

Note that this does not refer to validating the XML with the W3C XML Schema or RELAX NG. See below for more.

DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
factory.setNamespaceAware(true);
factory.setValidating(true);
factory.setXIncludeAware(true);

 

3.1. XML Schema Validation

To validate an XML document with against a schema defined in W3C XML Schema or RELAX NG, you need to create a Schema object from a schema file and associate it with the DocumentBuilder object.

SchemaFactory sfactory = SchemaFactory.newInstance();
Schema schema = sfactory.newSchema(schema);
/* DocumentBuilderFactory */ factory.setSchema(schema);

Once the factory is configured to your satisfaction, you can create the DOM Parser.

DocumentBuilder parser = factory.newDocumentBuilder();

4. Parsing XML

Parse an XML file to create the DOM object using the DocumentBuilder.

Document document = parser.parse(new File(xmlFilePath));

Need to parse XML in a String? Construct a StringReader and use an InputSource.

String str = ...;
StringReader reader = new StringReader(str);
InputSource in = new InputSource(reader);
Document document = builder.parse(in);

How about parsing XML from the jar resources folder? Use getResourceAsStream() to get an InputStream and pass it to the parse() method.

String resPath = "/xml/data.xml";
InputStream in = sample.class.getResourceAsStream(resPath);
if ( in == null )
    throw new Exception("resource not found: " + resPath);
Document document = builder.parse(in);

Conclusion

This article explained how to parse XML using a DOM Parser in Java. A DOM Parser can be used when the XML file is small enough to be loaded completely into memory. If the XML file is too large, other Java APIs are available for parsing the XML.

Leave a Reply

Your email address will not be published. Required fields are marked *