Article
XML DTDs Vs XML Schema
XML is a very handy format for storing and communicating your data between disparate systems in a platform-independent fashion. XML is more than just a format for computers -- a guiding principle in its creation was that it should be Human Readable and easy to create.
XML allows UNIX systems written in C to communicate with Web Services that, for example, run on the Microsoft .NET architecture and are written in ASP.NET. XML is however, only the meta-language that the systems understand -- and they both need to agree on the format that the XML data will be in. Typically, one of the partners in the process will offer a service to the other: one is in charge of the format of the data.
The definition serves two purposes: the first is to ensure that the data that makes it past the parsing stage is at least in the right structure. As such, it's a first level at which 'garbage' input can be rejected. Secondly, the definition documents the protocol in a standard, formal way, which makes it easier for developers to understand what's available.
DTD - The Document Type Definition
The first method used to provide this definition was the DTD, or Document Type Definition. This defines the elements that may be included in your document, what attributes these elements have, and the ordering and nesting of the elements.
The DTD is declared in a DOCTYPE declaration beneath the XML declaration contained within an XML document:
Inline Definition:
<?xml version="1.0"?>
<!DOCTYPE documentelement [definition]>
External Definition:
<?xml version="1.0"?>
<!DOCTYPE documentelement SYSTEM "documentelement.dtd">
The actual body of the DTD itself contains definitions in terms of elements and their attributes. For example, the following short DTD defines a bookstore. It states that a bookstore has a name, and stocks books on at least one topic.
Each topic has a name and 0 or more books in stock. Each book has a title, author and ISBN number. The name of the topic, and the name of the bookstore are defined as being the same type of element: this store's PCDATA: just text data. The title and author of the book are stored as CDATA -- text data that won't be parsed for further characters by the XML parser. The ISBN number is stored as an attribute of the book:
<!DOCTYPE bookstore [
<!ELEMENT bookstore (topic+)>
<!ELEMENT topic (name,book*)>
<!ELEMENT name (#PCDATA)>
<!ELEMENT book (title,author)>
<!ELEMENT title (#CDATA)>
<!ELEMENT author (#CDATA)>
<!ELEMENT isbn (#PCDATA)>
<!ATTLIST book isbn CDATA "0">
]>
An example of a book store's inline definition might be:
<?xml version="1.0"?>
<!DOCTYPE bookstore [
<!ELEMENT bookstore (name,topic+)>
<!ELEMENT topic (name,book*)>
<!ELEMENT name (#PCDATA)>
<!ELEMENT book (title,author)>
<!ELEMENT title (#CDATA)>
<!ELEMENT author (#CDATA)>
<!ELEMENT isbn (#PCDATA)>
<!ATTLIST book isbn CDATA "0">
]>
<bookstore>
<name>Mike's Store</name>
<topic>
<name>XML</name>
<book isbn="123-456-789">
<title>Mike's Guide To DTD's and XML Schemas<</title>
<author>Mike Jervis</author>
</book>
</topic>
</bookstore>
Using an inline definition is handy when you only have a few documents and they're offline, as the definition is always in the file. However, if, for example, your DTD defines the XML protocol used to talk between two seperate systems, re-transmitting the DTD with each document adds an overhead to the communciations. Having an external DTD eliminates the need to re-send each time. We could remove the DTD from the document, and place it in a DTD file on a Web server that's accessible by the two systems:
<?xml version="1.0"?>
<!DOCTYPE bookstore SYSTEM "http://webserver/bookstore.dtd">
<bookstore>
<name>Mike's Store</name>
<topic>
<name>XML</name>
<book isbn="123-456-789">
<title>Mike's Guide To DTD's and XML Schemas<</title>
<author>Mike Jervis</author>
</book>
</topic>
</bookstore>
The file bookstore.dtd would contain the full defintion in a plain text file:
<!ELEMENT bookstore (name,topic+)>
<!ELEMENT topic (name,book*)>
<!ELEMENT name (#PCDATA)>
<!ELEMENT book (title,author)>
<!ELEMENT title (#CDATA)>
<!ELEMENT author (#CDATA)>
<!ELEMENT isbn (#PCDATA)>
<!ATTLIST book isbn CDATA "0">
The lowest level of definition in a DTD is that something is either CDATA or PCDATA: Character Data, or Parsed Character Data. We can only define an element as text, and with this limitation, it is not possible, for example, to force an element to be numeric. Attributes can be forced to a range of defined values, but they can't be forced to be numeric.
So for example, if you stored your applications settings in an XML file, it could be manually edited so that the windows start coordinates were strings -- and you'd still need to validate this in your code, rather than have the parser do it for you.
Mike works as a Senior Developer for