Article
XML DTDs Vs XML Schema
XML Schemas
XML Schemas provide a much more powerful means by which to define your XML document structure and limitations. XML Schemas are themselves XML documents. They reference the XML Schema Namespace (detailed here), and even have their own DTD.
What XML Schemas do is provide an Object Oriented approach to defining the format of an XML document. XML Schemas provide a set of basic types. These types are much wider ranging than the basic PCDATA and CDATA of DTDs. They include most basic programming types such as integer, byte, string and floating point numbers, but they also expand into Internet data types such as ISO country and language codes (en-GB for example). A full list can be found here.
The author of an XML Schema then uses these core types, along with various operators and modifiers, to create complex types of their own. These complex types are then used to define an element in the XML Document.
As a simple example, let's try to create a basic XML Schema for defining the bookstore that we used as an example for DTDs. Firstly, we must declare this as an XSD Document, and, as we want this to be very user friendly, we're going to add some basic documentation to it:
<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema">
<xsd:annotation>
<xsd:documentation xlm:lang="en">
XML Schema for a Bookstore as an example.
</xsd:documentation>
</xsd:annotation>
Now, in the previous example, the bookstore consisted of the sequence of a name and at least one topic. We can easily do that in an XML Schema:
<xsd:element name="bookstore" type="bookstoreType"/>
<xsd:complexType name="bookstoreType">
<xsd:sequence>
<xsd:element name="name" type="xsd:string"/>
<xsd:element name="topic" type="topicType" minOccurs="1"/>
</xsd:sequence>
</xsd:complexType>
In this example, we've defined an element, bookstore, that will equate to an XML element in our document. We've defined it of type bookstoreType, which is not a standard type, and so we provide a definition of that type next.
We then define a complexType, which defines bookstoreType as a sequence of name and topic elements. Our "name" type is an xsd:string, a type defined by the XML Schema Namespace, and so we've fully defined that element.
The topic element, however, is of type topicType, another custom type that we must define. We've also defined our topic element with minOccurs="1", which means there must be at least one element at all times. As maxOccurs is not defined, there no upper limit to the number of elements that might be included. If we had specified neither, the default would be exactly one instance, as is used in the name element. Next, we define the schema for the topicType.
<xsd:complexType name="topicType">
<xsd:element name="name" type="xsd:string"/>
<xsd:element name="book" type="bookType" minOccurs="0"/>
</xsd:complexType>
This is all similar to the declaration of the bookstoreType, but note that we have to re-define our name element within the scope of this type. If we'd used a complex type for name, such as nameType, which defined only an xsd:string -- and defined it outside our types, we could re-use it in both. However, to illustrate the point, I decided to define it within each section. XML gets interesting when we get to defining our bookType:
<xsd:complexType name="bookType">
<xsd:element name="title" type="xsd:string"/>
<xsd:element name="author" type="xsd:string"/>
<xsd:attribute name="isbn" type="isbnType"/>
</xsd:complexType>
<xsd:simpleType name="isbnType">
<xsd:restriction base="xsd:string">
<xsd:pattern value="\[0-9]{3}[-][0-9]{3}[-][0-9]{3}"/>
</xsd:restriction>
</xsd:simpleType>
So the definition of the bookType is not particularly interesting. But the definition of its attribute "isbn" is. Not only does XML Schema support the use of types such as xsd:nonNegativeNumber, but we can also create our own simple types from these basic types using various modifiers. In the example for isbnType above, we base it on a string, and restrict it to match a given regular expression. Excusing my poor regex, that should limit any isbn attribute to match the standard of three groups of three digits separated by a dash.
This is just a simple example, but it should give you a taste of the many things you can do to control the content of an attribute or an element. You have far more control over what is considered a valid XML document using a schema. You can even
- extend your types from other types you've created,
- require uniqueness within scope, and
- provide lookups.
It's a nicely object oriented approach. You could build a library of complexTypes and simpleTypes for re-use throughout many projects, and even find other definitions of common types (such as an "address", for example) from the Internet and use these to provide powerful definitions of your XML documents.