XML syntax

An example XML document

XML documents use a self-describing and simple syntax.

<?xml version="1.0" encoding="ISO-8859-1"?>
<note>
<to>Tove</to>
<from>Jani</from>
<heading>Reminder</heading>
<body>Don't forget me this weekend!</body>
</note>

The first line in the document - the XML declaration - defines the XML version and the character encoding used in the document. In this case the document conforms to the 1.0 specification of XML and uses the ISO-8859-1 (Latin-1/West European) character set.

The next line describes the root element of the document (like it was saying: "this document is a note"):

<note>

The next 4 lines describe 4 child elements of the root (to, from, heading, and body):

<to>Tove</to>
<from>Jani</from>
<heading>Reminder</heading>
<body>Don't forget me this weekend!</body>

And finally the last line defines the end of the root element:

</note>

Can you detect from this example that the XML document contains a Note to Tove from Jani? Don't you agree that XML is pretty self-descriptive?

All XML elements must have a closing tag

With XML, it is illegal to omit the closing tag.

In HTML some elements do not have to have a closing tag. The following code is legal in HTML:

<p>This is a paragraph
<p>This is another paragraph

In XML all elements must have a closing tag, like this:

<p>This is a paragraph</p>
<p>This is another paragraph</p>

Note: You might have noticed from the previous example that the XML declaration did not have a closing tag. This is not an error. The declaration is not a part of the XML document itself. It is not an XML element, and it should not have a closing tag.

XML tags are case sensitive

Unlike HTML, XML tags are case sensitive.

With XML, the tag <Letter> is different from the tag <letter>.

Opening and closing tags must therefore be written with the same case:

<Message>This is incorrect</message>

<message>This is correct</message>

All XML elements must be properly nested

Improper nesting of tags makes no sense to XML.

In HTML some elements can be improperly nested within each other like this:

<b><i>This text is bold and italic</b></i>

In XML all elements must be properly nested within each other like this:

<b><i>This text is bold and italic</i></b>

All XML documents must have a root element

All XML documents must contain a single tag pair to define a root element.

All other elements must be within this root element.

All elements can have sub elements (child elements). Sub elements must be correctly nested within their parent element:

<root>
  <child>
    <subchild>.....</subchild>
  </child>
</root>

Attribute values must always be quoted

With XML, it is illegal to omit quotation marks around attribute values.

XML elements can have attributes in name/value pairs just like in HTML. In XML the attribute value must always be quoted. Study the two XML documents below. The first one is incorrect, the second is correct:

<?xml version="1.0" encoding="ISO-8859-1"?>
<note date=12/11/2002>
<to>Tove</to>
<from>Jani</from>
</note>

<?xml version="1.0" encoding="ISO-8859-1"?>
<note date="12/11/2002">
<to>Tove</to>
<from>Jani</from>
</note>

The error in the first document is that the date attribute in the note element is not quoted.

This is correct: date="12/11/2002". This is incorrect: date=12/11/2002.

With XML, white space is preserved

With XML, the white space in your document is not truncated.

This is unlike HTML. With HTML, a sentence like this:

Hello my name is Tove,

will be displayed like this:

Hello my name is Tove,

because HTML strips off the white space.

With XML, CR / LF is converted to LF

With XML, a new line is always stored as LF.

Do you know what a typewriter is? Well, a typewriter is a type of mechanical device they used in the previous century :-)

After you have typed one line of text on a typewriter, you have to manually return the printing carriage to the left margin position and manually feed the paper up one line.

In Windows applications, a new line in the text is normally stored as a pair of CR LF (carriage return, line feed) characters. In Unix applications, a new line is normally stored as a LF character. Macintosh applications use only a CR character to store a new line.

Comments in XML

The syntax for writing comments in XML is similar to that of HTML.