Novixys Software Dev Blog | Page 15 of 15 |

Introduction

CDATA stands for Character Data. A CDATA section in XML is used to escape text containing characters which would otherwise be recognized as markup. It can appear anywhere character data can occur.

Markup

A CDATA section is marked up starting with “<![CDATA[” and ending with “]]>“. Any character data (other than “]]>“) can appear within the section without needing to be escaped. For example, angle brackets (<>) and ampersands (&) which indicate XML markup need not be escaped within a CDATA section.

For example, the following is a CDATA section. The angle brackets surrounding “greeting” and “/greeting” need not be escaped. When processing this XML, the parser receives the text “<greeting>Hello, world!</greeting>” as character data and not as markup.

<![CDATA[<greeting>Hello, world!</greeting>]]>

In addition, parameter entity references are recognized within CDATA sections. For example, assume the following parameter entity is defined:

<!ENTITY AnEntity "Sample entity data here">

Within the following CDATA section, the entity reference “%AnEntity” is recognized and the value is replaced within character data passed to the XML processor.

<![CDATA[My value is %AnEntity]]>

Nesting

A CDATA section may not be nested inside another because “]]>” may not appear directly except to end the CDATA section. The following is invalid:

<![CDATA[We need the ending "]]>" here.]]>

Instead the above can be written in two sections as follows:

<![CDATA[We need the ending "]]]]><![CDATA[>" here.]]>

Attribute Value

Within a Document-Type Definition (DTD), an attribute value may be declared to be of type CDATA as follows:

<!ATTLIST img
          src CDATA #REQUIRED>

This declaration states that an img element must have a src attribute whose value type is CDATA.

Where is it used?

CDATA sections are used when larger amounts of verbatim text need to appear within XML documents and processed verbatim. Smaller quantities of such text can be properly encoded to escape the XML characters, but for larger text, it helps to preserve the meaning of the text without having to do so.