Contents
1. Introduction
Some characters are treated specially when processing XML documents. These are the characters which are used to markup XML syntax; when they appear as a part of a document rather than for syntax markup, they need to be appropriately escaped. These characters are:
" " Double quote ' ' Single quote < < Left angle bracket > > Right angle bracket & & Ampersand
2. Character Data
All text that is not markup constitutes character data of the document. Within character data, “&” and “<” must not appear except when used in markup. However “>”, “”” and “‘” can appear directly within character data without having to be encoded.
<?xml version="1.0"?> <valid>"'></valid>
<?xml version="1.0"?> <invalid>&</invalid>
<?xml version="1.0"?> <invalid><</invalid>
However, “>” cannot appear in the form “]]>” unless it is a part of CDATA ending sections.
<?xml version="1.0"?> <invalid>]]></invalid>
The following is valid as “>” has been encoded properly:
<?xml version="1.0"?> <valid>]]></valid>
2. Attributes
Within attributes, “>” is valid.
<valid attrib=">"></valid>
However, ‘”‘ is not valid within double quotes; it must be encoded using “"”.
<valid attrib="""></valid>
Similarly “‘” is not valid within single quotes. It must be encoded as “'”:
<valid attrib='''></valid>
3. Comments
Comments can appear anywhere in a document outside of markup. Within comments, none of the 5 special characters must be escaped or encoded.
<valid><!-- '"<>& --></valid>
In addition, the string “–” must not appear within comments.
<invalid><!-- Hello -- there --></invalid>
A consequence of this rule is that a comment must not end with “—>” (three dashes followed by a right-angle-bracket). The following is invalid:
<invalid><!-- Hello there ---></invalid>
4. Processing Instructions
Processing Instructions (PI) are used to add instructions for XML processing applications. None of the 5 special characters must be encoded within PI statements.
<valid><?execute <>"'& ?></valid>
A further restriction in the case of processing instructions is that the instruction name must not be the string “xml”; this name is reserved for standardization of the XML specification itself.
<invalid><?xml hello ?></invalid>
CData
CDATA sections are used to escape blocks of text containing characters which would otherwise be recognized as markup. This section begin with the string “<![CDATA[” and end with the string “]]>”. Within a CData section, none of the 5 special characters must be encoded.
<valid><![CDATA[[<greeting>Hello world: <>&'" </greeting>]]></valid>
However, within CData sections, the string “]]>” must not appear except to end the section:
<invalid><![CDATA[[This string("]]>") must not appear here]]></invalid>
The same can be re-written with two CData sections as follows:
<valid><![CDATA[[This string("]]]]><![CDATA[[>") must not appear here]]></valid>
Conclusion
This article demonstrated what the predefined XML entities are and the various circumstances in which they can be used.