XML CDATA

All text in an XML document will be parsed by the parser.

Only text inside a CDATA section is ignored by the parser.


Parsed Data

XML parsers normally parse all the text in an XML document.

When an XML element is parsed, the text between the XML tags is also parsed:

<message>This text is also parsed</message>

The parser does this because XML elements can contain other elements, like in this example, where the <name> element contains two other elements (first and last):

<name><first>Bill</first><last>Gates</last></name>

and the parser will break it up into sub-elements like this:

<name>
<first>Bill</first>
<last>Gates</last>
</name>


Escape Characters

Illegal XML characters have to be replaced by entity references.

If you place a character like "<" inside an XML element, it will generate an error because the parser interprets it as the start of a new element. You cannot write something like this:

<message>if salary < 1000 then</message>

To avoid this, you have to replace the "<" character with an entity reference, like this:

<message>if salary &lt; 1000 then</message>

There are 5 predefined entity references in XML:

&lt; < less than
&gt; > greater than
&amp; & ampersand 
&apos; ' apostrophe
&quot; " quotation mark

Entity references always start with the "&" character and end with the ";" character.

CDATA

Everything inside a CDATA section is ignored by the parser.

If your text contains a lot of "<" or "&" characters - like program code often does - the XML element can be defined as a CDATA section.

A CDATA section starts with "<![CDATA[" and ends with "]]>":

<script>
<![CDATA[
function matchwo(a,b)
{
if (a < b && a < 0) then
{
return 1
}
else
{
return 0
}
}
]]>
</script>

In the previous example, everything inside the CDATA section is ignored by the parser.

Notes on CDATA sections:

A CDATA section cannot contain the string "]]>", therefore, nested CDATA sections are not allowed.

Also make sure there are no spaces or line breaks inside the "]]>" string.