Article
Back to Basics: XML In .NET
One of the most exciting recent advances in computing has been XML. Designed as a stricter and simpler document format than SGML, XML is now used everywhere to produce cross-platform interoperable file formats.
It's also core to .NET, and is something every .NET developer will need to come to grips with. I've tried to keep this introduction to XML as broad as possible, so it should be of use to users of all developmental persuasions.
XML 101: Learning to Crawl
Before we look into the specifics of XML, it is important to know why XML exists and where it can be used. A proper understanding will allow you to use it effectively in your projects.
Where HTML was designed to display data and specify how that data should look, XML was designed to describe and structure data. In this way, an XML file itself doesn't actually do anything. It doesn't say how to display the data or what to do with data, just as a text file doesn't.
But XML crucially differs from plain text in that it allows you to structure your data in a standard manner. This is important -- it means that other systems can interpret your XML, which is not as easily achievable in plain text. This describes what is meant by "interoperable file format" -- once you produce an XML file, it is open to everyone. An input, and all the information required to understand the structure of your data, is included in the file.
Let's take an example. Here's a text file and an XML file that both store the same information:
mymusic.txt
The Bends,Radiohead, Street Spirit
Is This It?,The Strokes, Last Nite
mymusic.xml
<catalog>
<cd>
<title>The Bends</title>
<artist>Radiohead</artist>
<tracks>
<track name="Street Spirit"/>
</tracks>
</cd>
<cd>
<title>Is This It?</title>
<artist>The Strokes</artist>
<tracks>
<track name="Last Nite"/>
</tracks>
</cd>
</catalog>
Notice how the subject of our data is defined in the XML file. We can see clearly that there is a catalogue containing CDs, each of which contains some tracks (music aficionados will notice that I have cut down the track listings for space!). You can also see that XML can be less efficient than some other file formats. Yet, in many cases, the loss in efficiency that results from the increased size can be made up by the speed of processing a well-defined XML file, as parsers (programs that read XML) can predict the structure.
The way we'd interpret the plain text file would be dependent on how we designed our own format. No information exists to tell others what the actual data means, its order, or how to parse (read) it in other projects. By contrast, the XML file shows clearly what each piece of information represents and where it belongs in the data hierarchy. This "data-describing data" is known as metadata, and is a great strength of XML in that you can create your own specifications and structure your data to be interpreted by any other system.
Terminology
To start using XML effectively, a sound knowledge of its terminology and file structures needs to be gained.
<catalog>
<cd>
<title>The Bends</title>
<artist>Radiohead</artist>
<tracks>
<track name="Street Spirit"/>
</tracks>
</cd>
</catalog>
XML files are hierarchical, with each tag defining an element. All elements need both an opening and a closing tag (<catalog> being an opening tag, </catalog> being its closing tag). Some elements are self-contained and do not require any information to be enclosed. These tags can be made self-closing by the addition of "/>" to the end of the opening tag, as with the track element above.
The structure of the catalogue is such that it contains CDs, which in turn contain tracks. This is our hierarchy, and will be important later, when we need to parse the document. For example, the track "Street Spirit" corresponds to the CD "The Bends," just as the track "Last Nite" corresponds to the CD "Is This It?" If we didn't use a suitable hierarchy, we wouldn't be able to ascertain this during parsing.
Sometimes, it doesn't make sense for information to appear between opening and closing tags. For example, if we need more than one piece of information to describe an element, we might like to include those multiple pieces of information within a single tag. We therefore define attributes of the element in the form attribute="value".
Once you have produced your own set of elements and structures, these formats can be referred to as dialects. For example, RSS is an XML dialect.
Namespaces
With so many different dialects floating around, conflicts of meaning can easily arise. For example, take the following XML files, both of which describe some data:
<film-types>
<film-type>Action</film-type>
<film-type>Adventure</film-type>
</film-types>
<film-types>
<film-type>black and white</film-type>
<film-type>colour</film-type>
</film-types>
The first file specifies genres of movies, while the second specifies different types of camera film. But, as the consumers of these files, how can we differentiate between them?
Namespaces provide the answer. An XML namespace allows us to qualify an element in the same way as telephone area codes qualify phone numbers. There might be thousands of telephone numbers of 545-321. When we add an area code and, perhaps, an international code, we make the number unique: +44 020 545-321.
The "area code" for XML namespaces is a URI, which is associated with a prefix for the namespace. We define a namespace using an xmlns declaration, followed by the prefix, which is equal to a URI that uniquely identifies the namespace:
xmlns:movie="http://www.sitepoint.com/movies">
By adding this namespace definition as an attribute to a tag, we can use the prefix movie in that tag, and any tags it contains, to fully qualify our elements:
<movie:film-types xmlns:movie="http://www.sitepoint.com/movies">
<movie:film-type>Action</movie:film-type>
<movie:film-type>Adventure</movie:film-type>
</movie:film-types>
Similarly, with the second, we can choose a different namespace "camera":
<camera:film-types xmlns:camera="http://www.sitepoint.com/camera">
<camera:film-type>black and white</camera:film-type>
<camera:film-type>colour</camera:film-type>
</camera:film-types>
Parsers can now recognise both meanings of "film type" and handle them accordingly.
Valid XML
In order for an XML file to be valid, it needs at the very least to conform to the XML specification, version 1.0. This standardises exactly how your XML file is formed so that other systems can understand it. For example, XML 1.0 requires that all XML files consist of one root element; that is, a single element contains all other elements. In our music library example above, catalog is our root element, as it contains all our other elements.
The full XML specification can be read here, although, as we'll see shortly, .NET gives you the tools to write valid XML automatically.
Philip is a Computer Science PhD student at Liverpool John Moores University. He's still not mastered guitar tabs, never finished Mario, and needs a haircut. He discusses life at