Article
Introduction to XML
XML: It's just text!
Welcome to the kickoff of SitePoint's XML week! This week a few of us will be doing our best to dissolve the hype surrounding XML, showing you how it works and how you can use it to enhance your Website in ways you may never have imagined.
Ask Web builder: "What is XML?" and you'll get answers which range from "It's the greatest thing since spliced silicon!" to "Who cares?"; from passion to paranoia. Frequently the answer you'll get is simply "I don't understand XML -- but I wish I did".
As with all technologies, finding answers to simple questions is often difficult and perhaps this is the case with no technology more than it is XML. So in this article I'll be putting XML under a microscope, showing you what it is and how it works. What's more, I'll dare to make the bold claim that by the time you've finished reading this article you'll be saying "Is that all it is? That's so easy!"
So here's what's in store:
- The Who, What and Why of XML: It's just text -- honest!
- Rolling your Own: The rules for making XML
- Parsing in the Night: When XML comes alive
- XML in Action: Things to do with XML
Warning: I use PHP to illustrate some of the examples here, but if you don't know PHP, you should be able to skip these examples, while still grasping the overall concepts. Ok, let's get started.
The Who, What and Why of XML?
Q. Who needs to know about XML?
A. Anyone involved in building the Web or working in the IT business!
A bold claim perhaps, but XML is a technology that can be put to good use by everyone from Web designers to programmers. XML is more or less a mature technology these days. It has already begun creeping into all our lives, and over the next few years, it's highly likely to become the norm for countless tasks -- from rendering a single Web page, to exchanging business data worldwide. The big software vendors like Sun, Microsoft, IBM and Hewlett Packard continually add more and more XML-related functionality to their products, and in general, life online is migrating to XML.
I should make it clear that when I said "XML" just now, I was actually talking about a whole range of "add-on" technologies that build on the basic XML standard. For Web designers, for example, that means XHTML and XSLT, while developers should be concerned with technologies like SOAP, XML Schema and many more. We'll be sticking with basic XML for the most of this article.
In short, if you've read this far, you probably need to know about XML. The good news is it couldn't be simpler...
What is XML?
Ever created a file with Notepad? If so, you've already ready worked with the raw material of XML: text! That's right; it's nothing but good old ASCII.
XML is simply a set of rules for laying out text in order to make that text easier to "navigate". If you've ever edited an HTML file with a text editor, you practically know XML already.
So, let's first ask "What is HTML?" One way you might describe HTML is a set of rules for laying out text so that it's easy for human beings to read. If I put a word in bold, your eye is drawn to it because it stands out against the rest of the text. HTML allows us to exchange information between people in a way that's easy for them to read. Where would we be without formatting? The Internet would be one giant README file!
Well XML is equivalent to HTML designed primarily for computers to read. Computers, not being as smart a human beings, need to be given more detail to help them find their way around a piece of information. XML is there to tell a computer not just how a document is formatted but also exactly what information that document contains.
Why XML?
That's a very general description, but a far easier way is to understand XML is in terms of the problem it is there to solve...
Say we have a text file that contains some information about customers to our Website. It's come from a database and every "row" of the file contains four pieces of information (let's call them "elements") - the person's first name, their last name, their email and the city they live in. For example;
Joe Bloggs jbloggs@yahoo.com Washington
Mary Woods mwoods@hotmail.com London
We have a computer program with which we want to read this file, and extract the customer data, so we can email them with our latest special offer (spam them, in other words).
Looking at the above file, each "element" is separated from the next with a space character. Also, each row in the list is separated from the next by a new line character (each row starts on a new line). Using those characters, we can write a computer program to break the text file into smaller pieces we can use.
Using PHP by way of example, here's how we could do it (bear with me if you don't know PHP):
/* We begin with our customer list stored
in a variable called $customer_list */
/* Break up the customer list using PHP's
explode() function and the newline character
at the end of each row */
$customers = explode ( "\n" , $customers );
/* Now $customers is an array variable
(basically a list of variables) so
we can loop through it like so; */
foreach ( $customers as $customer ) {
/* Explode each "row" stored in $customer using
the space character to create another array
called $element */
$element = explode ( " ", $customer );
/* Send an email using $element */
mail ( $element[2],"Special Offer",
"Hi $element[0] $element[1],\nHow's it going in $element[3]?" );
}
The above code would send out two emails. The first goes to jbloggs@yahoo.com containing the message:
"Hi Joe Bloggs,
How's it going in Washington?"
The second email goes to mwoods@hotmail.com and contains:
"Hi Mary Woods,
How's it going in London?"
So, using some basic PHP formatting functions, we were able to break up the text file into pieces with which we could work, using the space characters and newlines to find the start and end points of the "chunks" of data we want.
Harry has been working in corporate IT since 1994, with everything from start-ups to Fortune 100 companies. Outside of office hours he runs