Article
Introduction to XML
Now that's all fine until our customers list gets some new records (look carefully at rows three and four):
Joe Bloggs jbloggs@yahoo.com Washington
Mary Woods mwoods@hotmail.com London
Jean Du Vin jduvin@wanadoo.fr Paris
Mike Macey mmacey@nyonline.com New York
First we have "Jean Du Vin". Now because we're smart people, we can guess that "Du Vin" is probably the last name of this person, but how's our computer program going to know that? It's been told that after the first space it will find the persons last name then after the second space it should find the email address. But with this name it's going to decide that the last name is "Du", the email address is "Vin" and the city is "jduvin@wanadoo.fr".
And we've got another problem with the city "New York". It also contains a space, so our program is probably going to decided the the city is called just "New" rather than "New York".
Now we could update our program so that it understands "Jean Du Vin" and "New York" as special cases. But imagine we have a list of 10,000 customers. How many special cases are we going to have to deal with? Instantly we have a nightmare on our hands.
So we need some kind of special character to separate the "elements", right? How about a comma (as you might find in a CSV file)?
Joe,Bloggs,jbloggs@yahoo.com,Washington
Mary,Woods,mwoods@hotmail.com,London
Jean,Du Vin,jduvin@wanadoo.fr,Paris
Mike,Macey,mmacey@nyonline.com,New York
Now we can look for the commas instead of spaces, and we've solved the problem! Well... we have until someone enters "Paris, Texas" to distinguish it from just "Paris". Although the commas are a step forward, there may also be special cases we need to be prepared for. Also we rely on the "elements" to appear in a row in the right order - it would be nice, for instance, if we could identify an email as an email where ever it appears in the list. And what if elements are missing in some rows, or we have extra elements we weren't expecting?
Enter: XML
How about we lay out our list like this:
<?xml version="1.0"?>
<customer_list>
<customer>
<first_name>Joe</first_name>
<email>jbloggs@yahoo.com</email>
<last_name>Bloggs</last_name>
<city>Washington</city>
</customer>
<customer>
<last_name>Woods</last_name>
<first_name>Mary</first_name>
<city>London</city>
<email>mwoods@hotmail.com</email>
</customer>
<customer>
<last_name>Du Vin</last_name>
<first_name>Jean</first_name>
<email>jduvin@wanadoo.fr</email>
<city>Paris</city>
<country>France</country>
</customer>
<customer>
<city>New York</city>
<last_name>Macey</last_name>
<email>mmacey@nyonline.com</email>
</customer>
</customer_list>
Now our problem really is solved! Every "element" of data is neatly wrapped up in "tags" which make it clear exactly where the data begins and ends.
We also have a description of what each piece of data actually is (such as an "email" or a "city") and, as such, the data can appear in any order without worrying our program.
Another gain is that we also have a data "hierarchy" - the "customer_list" tag contains elements called "customers" which, in turn, contain the "first_name", "last_name", "email" and "city" elements.
I've also slipped in a couple more surprises: removing an element from the last customer and for the one above, adding a new element <country /> - and with XML, this is no problem. That's what the X stands for: eXtensible (imagine what would have happened if I'd removed or added an element to the comma-seperated file). So, extracting the data from this file is now is simple job for any system.
And that grasps the essence of XML: it's a technology for transferring data between systems in a platform-independent manner.
For example, a Windows workstation can fetch XML data from a mainframe, do something to it, and then pass it on to a Linux server, without any of them batting an eyelid.
The same goes for the exchange of data between applications on the same system - with MySQL for example, the mysqldump utility can be used to deliver data in XML form to a file (e.g. mysqldump -X mydatabase), which can then be delivered (perhaps with aid of PHP) to a Web browser for viewing.
Because all modern operating systems support the ASCII text standard, XML makes the perfect choice for data exchange from anywhere to anywhere.
All the applications of XML that you may have come across (XSLT, XML-RPC, SOAP, XML Schema) are just mechanisms that are used to enhance the ability of XML to exchange data in some way.
One final thing to be aware of when you think about XML is that, although XML is at heart just a standard -- a set of rules for formatting ASCII text -- bit by bit it's reaching the point where it could be regarded as a programming language, when you take into account "add-on" technologies like XSLT.