XML documents use a self-describing and simple syntax.
<?xml version="1.0" encoding="ISO-8859-1"?> |
The first line in the document - the XML declaration - defines the XML version and the character encoding used in the document. In this case the document conforms to the 1.0 specification of XML and uses the ISO-8859-1 (Latin-1/West European) character set.
The next line describes the root element of the document (like it was saying: "this document is a note"):
<note> |
The next 4 lines describe 4 child elements of the root (to, from, heading, and body):
<to>Tove</to> |
And finally the last line defines the end of the root element:
</note> |
Can you detect from this example that the XML document contains a Note to Tove from Jani? Don't you agree that XML is pretty self-descriptive?
With XML, it is illegal to omit the closing tag.
In HTML some elements do not have to have a closing tag. The following code is legal in HTML:
<p>This is a paragraph |
In XML all elements must have a closing tag, like this:
<p>This is a paragraph</p> |
Note: You might have noticed from the previous example that the XML declaration did not have a closing tag. This is not an error. The declaration is not a part of the XML document itself. It is not an XML element, and it should not have a closing tag.
Unlike HTML, XML tags are case sensitive.
With XML, the tag <Letter> is different from the tag <letter>.
Opening and closing tags must therefore be written with the same case:
<Message>This is incorrect</message> |
Improper nesting of tags makes no sense to XML.
In HTML some elements can be improperly nested within each other like this:
<b><i>This text is bold and italic</b></i> |
In XML all elements must be properly nested within each other like this:
<b><i>This text is bold and italic</i></b> |
All XML documents must contain a single tag pair to define a root element.
All other elements must be within this root element.
All elements can have sub elements (child elements). Sub elements must be correctly nested within their parent element:
<root> |
With XML, it is illegal to omit quotation marks around attribute values.
XML elements can have attributes in name/value pairs just like in HTML. In XML the attribute value must always be quoted. Study the two XML documents below. The first one is incorrect, the second is correct:
<?xml version="1.0" encoding="ISO-8859-1"?> |
<?xml version="1.0" encoding="ISO-8859-1"?> |
This is correct: date="12/11/2002". This is incorrect: date=12/11/2002.
With XML, the white space in your document is not truncated.
This is unlike HTML. With HTML, a sentence like this:
Hello my name is Tove,
will be displayed like this:
Hello my name is Tove,
because HTML strips off the white space.
With XML, a new line is always stored as LF.
Do you know what a typewriter is? Well, a typewriter is a type of mechanical device they used in the previous century :-)
After you have typed one line of text on a typewriter, you have to manually return the printing carriage to the left margin position and manually feed the paper up one line.
In Windows applications, a new line in the text is normally stored as a pair of CR LF (carriage return, line feed) characters. In Unix applications, a new line is normally stored as a LF character. Macintosh applications use only a CR character to store a new line.
The syntax for writing comments in XML is similar to that of HTML.
<!-- This is a comment -->