Article

Get XSL To Do Your Dirty Work

Page: 1 2 3 4 5 Next

Our Sample Document

Here's a simple example of an article written as an XML document. By its very nature, XML lets you define your own tags, so your site's document format can be as simple or as complex as you like. For this example, however, I'm using tags from the DocBook XML document format under the premise that it's always better to use a standard when one is available. See the Further Reading section at the end of this article for more information on the DocBook format.

<?xml version="1.0" encoding="UTF-8"?>  
<!DOCTYPE article PUBLIC "-//OASIS//DTD DocBook V4.0//EN"  
         "http://www.oasis-open.org/docbook/xml/4.0/docbookx.dtd">  
<article>  
 <title>A Sample Article</title>  
 <section>  
   <title>Article Section 1</title>  
   <para>  
   This is the first section of the article. Nothing terribly  
   interesting here, though.  
   </para>  
 </section>  
 <section>  
   <title>Another Section</title>  
   <para>  
   Just so you can see how these things work, here's an  
   itemized list:  
   </para>  
   <itemizedlist>  
     <listitem>  
       <para>The first item in the list</para>  
     </listitem>  
     <listitem>  
       <para>The second item in the list</para>  
     </listitem>  
     <listitem>  
       <para>The third item in the list</para>  
     </listitem>  
   </itemizedlist>  
 </section>  
</article>

This should look nice and simple, except for the first few lines. Let's take a closer look at those:

<?xml version="1.0" encoding="UTF-8"?>

This line is actually optional. It identifies the rest of the file as an XML document, indicates the version of the XML standard that it obeys (1.0), and sets the document encoding (UTF-8). If this document is to be used in a system where all documents are XML, you can leave off this line without a problem.

<!DOCTYPE article PUBLIC "-//OASIS//DTD DocBook V4.0//EN"  
         "http://www.oasis-open.org/docbook/xml/4.0/docbookx.dtd">

This rather gruesome looking thing is the Document type (DOCTYPE) declaration. It tells any program that reads this document (for our purposes, the XSL processor) where to find the Document Type Definition (DTD) that decribes what tags are allowed in what structure for this document. In this example, we indicate that the document should obey the DocBook 4.1.2 standard, which is defined at the URL shown (check it out with your Web browser if you're curious -- to learn more about DTDs, pick up any good book on XML). If the rest of the document doesn't obey the rules defined at that URL, the XSL processor will point out the error when it tries to process the document.

The DOCTYPE declaration is actually optional as well, but if you don't include it the XSL processor will not check that your document is correctly structured with valid tags. Without a DTD, it will only check that the tags you use are all matched with closing tags and properly nested (e.g. <b><i>this is good</i></b>, but <b><i>this is not</b></i>).

If you decide to use a DTD to validate the tags used on your site, you'll probably want to add the DOCTYPE declaration to documents automatically before they are processed, rather than forcing the user to type it at the top of each article. You could even add the <article> and </article> tags automatically as well, to minimize the number of tags that the user has to type.

Save the above file as docbook.xml and then open it in MSIE 6 or above. You should see something like Fig. 3.

An XML Document in MSIE 6Fig. 3: An XML Document in MSIE 6

By default, MSIE displays XML documents in this attractive, collapsible code view (you can expand and collapse portions of the document by clicking the red + and - icons next to the opening tags). Believe it or not, this view is actually generated in Dynamic HTML by a built-in XSL stylesheet that is applied whenever an XML document doesn't come with a stylesheet of its own.

Something else you might notice about Internet Explorer's XML processing engine is that it ignores the rules for acceptable tags and document structure set out in the DTD. If you changed the <article> and </article> tags to <invalidtag> and </invalidtag> (or any other tag not present in the DocBook Standard), Internet Explorer would not complain. That's because the XML processor in MSIE is said to be non-validating. More advanced, validating XML processors will validate the tags.

Internet Explorer does check that the XML document is well-formed, however. Try removing one of the </listitem> tags in the <itemizedlist>; you should get an error message as shown in Fig. 4.

Internet Explorer spots an unclosed tagFig. 4: Internet Explorer spots an unclosed tag

With our XML document ready to go, the next step is to create an XSL stylesheet to format it for display.

If you liked this article, share the love:
Print-Friendly Version Suggest an Article

Sponsored Links