Article
Back to Basics: XML In .NET
XML Schemas
While an XML file might conform to the XML specification, it might not be a valid form of a particular dialect. An XML schema lets you verify that certain elements are present, while making sure that the values presented are of the correct type.
There are a few different specifications of schemas: XSD, DTD, and XSX. Though DTD (Document Type Definition) is the most common schema used today, XSD (XML Schema Definition) is a newer standard that's gaining acceptance, as it provides the finest grained control for XML validation.
As XSD has many features, this introduction will concentrate on some of the simpler features you'll be able to employ.
The first line of a schema file usually looks something like the following:
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
The above defines that this file is a schema and that all the elements we're going to use for validation belong to the XML Schema namespace (to which we assign the prefix xs). You can set up additional namespaces and a number of XSD options in this tag. Check the full specification for more information.
One of the principal validations we may want to check is whether elements are of the correct type (e.g. when we're expecting a number, we don't want to receive a string of text). This check is performed in XSD using the element tag:
<xs:element name="foo" type="xs:integer"/>
This means that any elements named foo must contain an integer.
Hence, the following would validate.
<foo>10</foo>
However, the below code would not.
<foo>This is some text</foo>
You can check for a range of different types; again, see the specification for more detail.
We can also check for a set of valid values using an enumerator. For example, our element foo may only be able to take the values "apple," "orange," and "grape." Here, we wish to define our own type, as it isn't purely a string we're after. XSD provides the simpleType to let us do this.
<xs:element name="foo">
<xs:simpleType>
<xs:restriction base="xs:string">
<xs:enumeration value="Apple"/>
<xs:enumeration value="Orange"/>
<xs:enumeration value="Grape"/>
</xs:restriction>
</xs:simpleType>
</xs:element>
Notice the restriction element\, which gives us a base type from which to work. As we have a list of text strings, this is set as the string type.
So, we have basic validation on values for our elements, but what about the attributes of those elements? Well, in a similar way, we can define attributes inside an element tag:
<xs:element name="foo">
<xs:attribute name="colour" type="xs:string"/>
</xs:element>
Again, attributes can have their own types, using the simpleType elements we used for element above.
By default, all attributes are required. However, if we don't always require an attribute to be placed on an element, we can override this default using the use="optional" attribute on the attribute element:
<xs:attribute name="colour" type="xs:string" use="optional"/>
A complex element is one that contains other elements, such as the "CD" element in the catalogue example given earlier:
<cd>
<title>The Bends</title>
<artist>Radiohead</artist>
<tracks>
<track name="Street Spirit)"/>
</tracks>
</cd>
Here, we need to make sure CD elements contain a title, an artist, and a tracks element. We do so using a sequence:
<xs:element name="cd">
<xs:complexType>
<xs:sequence>
<xs:element name="title" type="xs:string"/>
<xs:element name="artist" type="xs:string"/>
<xs:element name="tracks" type="xs:string"/>
</xs:sequence>
</xs:complexType>
</xs:element>
We can create custom types or define attributes within other elements, to help us create our hierarchical structure.
You should notice that "tracks" is itself a complex type, as it is made up of other elements. Thus, we need to define another complexType, this time inside our tracks element. Our schema will then look like this:
<xs:element name="cd">
<xs:complexType>
<xs:sequence>
<xs:element name="title" type="xs:string"/>
<xs:element name="artist" type="xs:string"/>
<xs:element name="tracks">
<xs:complexType>
<xs:sequence>
<xs:element name="track">
<xs:attribute name="name" type="xs:string"/>
</xs:element>
</xs:sequence>
</xs:complexType>
</xs:element>
</xs:sequence>
</xs:complexType>
</xs:element>
Of course, this just touches the surface of XSD's abilities, but it should give you a good understanding from which to begin to write your own. Then again, you can always cheat, thanks to .NET and Microsoft's XSD Inference tool, which builds a "best guess" XSD schema file from any given XML file.
Learning to Read and Write
Before we start getting involved with the more exciting aspects of XML, we need to know how to produce and consume XML files in .NET. .NET organises its XML classes under the System.Xml namespace, so you may want to take advantage of, and familiarise yourself with the key classes we need to use:
XmlTextReader: TheXmlTextReaderclass is just one of the methods of reading XML files. It approaches an XML file in a similar way to a DataReader, in that you step through the document element by element, making decisions as you go. It's by far the easiest class to use to parse an XML file quickly.XmlTextWriter: Similarly, theXmlTextWriterclass provides a means of writing XML files line by line.XmlValidatingReader:XmlValidatingReaderis used to validate an XML file against a schema file.
Reading an XML File In .NET
Let's get acquainted with the XmlTextReader. XmlTextReader is based upon the XmlReader class, but has been specially designed to read byte streams, making it suitable for XML files that are to be located on disk, on a network, or in a stream.
As with any class, the first step is to create a new instance. The constructor takes the location of the XML file that it will read. Here's how to do it in C#:
// file
XmlTextReader reader = new XmlTextReader("foo.xml");
// url
XmlTextReader reader = new XmlTextReader("http://www.sitepoint.com/foo.xml");
// stream (here, a StringReader s)
XmlTextReader reader = new XmlTextReader(s);
Once it's loaded, we can only move through the file in a forward direction. This means that you need to structure your parsing routines so that they're order-independent. If you cannot be sure of the order of elements, your code must be able to handle any order.
We move through the file using the Read method:
while (reader.Read())
{
// parse our file
}
This loop will continue until we reach the end of our file, or we formally break the loop. We need to inspect each node, ascertain its type, and take the information we need. The NodeType property exposes the current type of node that's being read, and this is where things get a little complicated!
An XmlReader will see the following element as 3 different nodes:
<foo>text</foo>
The <foo> part of the element is recognised as an XmlNodeType.Element node. The text part is recognised as an XmlNodeType.Text node, and the closing tag </foo> is seen as an XmlNodeType.EndElement node.
The code below shows how we can output the XML tag through the reader object that we created earlier:
while (reader.Read())
{
switch (reader.NodeType)
{
case XmlNodeType.Element:
Console.Write("<"+reader.Name+">");
break;
case XmlNodeType.Text:
Console.Write(reader.Value);
break;
case XmlNodeType.EndElement:
Console.Write("</"+reader.Name+">");
break;
}
}
Here's the output of this code:
<foo>text</foo>
So, what about the attributes? Well, these can be picked up in a number of ways. An attribute of type XmlNodeType.Attribute, can be parsed in the fashion shown above. More elegantly however, when we hit an XmlNodeType.Element type, we can iterate through attributes using XmlTextReader.MoveToNextAttribute:
case XmlNodeType.Element:
Console.Write("<"+reader.Name);
while (reader.MoveToNextAttribute())
{
Console.WriteLine(reader.Name+" = "+reader.Value);
}
break;
Now, if we fed in this XML:
<foo first="1" second="2">text</foo>
we would receive this output:
<foo first="1" second="2">text</foo>
Earlier, we had ignored the attributes, and would have output:
<foo>text</foo>
Note that, if an element does not contain any attributes, the loop is never started. This means that we don't have first to check to see whether we have attributes. That said, the number of attributes on a node can be found using the AttributeCount property.