Article
Introduction to XML
Rolling Your Own
So, now that you know the what and why of XML, let's get down to creating an XML document.
I mentioned above that you probably, more or less, know XML already, having worked with HTML. Here's why:
<html>
<head>
<title> Almost an XML Page </title>
</head>
<body>
<table>
<tr>
<td>This is almost XML!</td>
</tr>
</table>
</body>
</html>
Compare that with what we saw above. Notice how we've got tags nested within tags? To turn the above HTML into XML all we need is to add the following at the top (note that this applies only to the above document -- most HTML is not XML-compliant):
<?xml version="1.0"?>
The reason why HTML and XML are so similar is because they were derived from an older standard SGML, that was conceived back in the 1960's. Comparing the two, XML is more pedantic than HTML. For example, you're probably used to using the following in HTML:
<img src="myimage.png">
In XML, all tags must be closed, so the above would have to be re-written as it is here (notice additional forward slash at the end):
<img src="myimage.png" />
There's a few features of XML that you need to be aware of:
Attributes vs. Elements
XML has two mechanisms for placing data (referred to as <i>character data</i>) in tags. Elements are placed between the tags:
<tag>This is an element</tag>
Attributes are placed in the opening tag (much like <a href="http://www.sitepoint.com" />):
<tag myattribute="This is an attribute">This is an element</tag>
Whether you use elements or attributes is up to you (it's a subject almost as hotly argued as PHP vs ASP!). With a little experience, you'll know the answer intuitively.
Commenting XML
XML comments are the same as HTML comments:
<!-- this is a comment and it can contain <tags /> which
will be ignored -->
Entities
Entities are a way to replace character data with something else. There are effectively two types of entity - those you have to have, and those that you define yourself. Make sure you know which entities are required by looking at the rules below.
You may have come across entities in HTML, for example ©, which tells the browser to display a '©'. Defining your own entities requires what's known as a Document Type Definition (DTD), discussed briefly at the end of this article.
In general, apart from the entities you have to use to create well-formed XML, you shouldn't need to worry about them too much.
Processing Instructions
Abbreviated to 'PI', processing instructions represent a way to insert special messages that will be recognised by the application that will read the document -- much like placing JavaScript within an HTML document. The <? and ?> tags are used to mark the start and end of a PI. An example using PHP might be:
<?xml version="1.0"?>
<myscript>
<authorised><? echo ( 'Welcome back!' ); ?></authorised>
<unauthorised><? echo ( 'Please log in' ); ?></unauthorised>
</myscript>
Note that the following is also acceptable:
<?xml version="1.0"?>
<myscript>
<authorised><?php echo ( 'Welcome back!' ); ?></authorised>
<unauthorised><?php echo ( 'Please log in' ); ?></unauthorised>
</myscript>
...understand now why the PHP group chose to mark up PHP that way?
CDATA
CDATA blocks are a way to tell any application reading your XML document to treat the contents as normal characters (i.e. that any XML tags it should happen to find within the CDATA block should be ignored). CDATA blocks are marked up using <![CDATA[ and ]]>. For example:
<?xml version="1.0"?>
<root>
<tag>
<![CDATA[
This <xml_tag /> will be treated as normal text.
]]>
</tag>
</root>
</root>