Article
A Really, Really, Really Good Introduction to XML
Chapter 2. XML in Practice
The last chapter introduced some basic concepts in XML and saw us start our CMS project. In this chapter, we're going to dig a little deeper into XML as we talk about namespaces, XHTML, XSLT, and CSS. In the process, we'll have take a couple of opportunities to make XML do something.
Meet the Family
In Chapter 1, Introduction to XML, we learned a few things about how XML is structured and what you can do with it. My goal for that chapter was to show you how flexible XML really is.
In this chapter, I'd like to zoom out a little and introduce you to some of the wacky siblings that make up the XML "Family of Technologies." Although I'm going to list a number of tools and technologies here, we'll cover only a few in this chapter. We'll explore some of the others in later chapters, but some will not be covered at all (sorry, but this would be a very long and boring book if we gave equal space to everything).
XSLT
XSLT stands for Extensible Stylesheet Language Transformations. It is both a style sheet specification and a kind of programming language that allows you to transform an XML document into the format of your choice: stripped ASCII text, HTML, RTF, and even other dialects of XML. In this chapter, you'll be introduced to XSLT concepts; later in the book, we'll explore these in more depth. XSLT uses XPath and several other technologies to do its work.
XPath
XPath is a language for locating and processing nodes in an XML document. Because each XML document is, by definition, a hierarchical structure, it becomes possible to navigate this structure in a logical, formal way (i.e. by following a path).
DTD and XML Schema
A document type definition (DTD) is a set of rules that governs the order in which your elements can be used, and the kind of information each can contain. XML Schema is a newer standard with capabilities that extend far beyond those of DTDs. While a DTD can provide only general control over element ordering and containment, schemas are a lot more specific. They can, for example, allow elements to appear only a certain number of times, or require that elements contain specific types of data such as dates and numbers.
Both technologies allow you to set rules for the contents of your XML documents. If you need to share your XML documents with another group, or you must rely on receiving well-formed XML from someone else, these technologies can help ensure that your particular set of rules is properly followed. We will explore both of these technologies with loving attention in Chapter 3, DTDs for Consistency.
XML Namespaces
The ability of XML to allow you to define your own elements provides flexibility and scope. But it also creates the strong possibility that, when combining XML content from different sources, you'll experience clashes between code in which the same element names serve very different purposes. For example, if you're running a bookstore, your use of <title> tags in XML may be used to track book titles. A mortgage broker would use <title> in a different way—perhaps to track the title on a deed. A dentist or doctor might use <title> to track patients' formal titles (Mr., Ms., Mrs., or Dr.) on their medical records. Try to combine all three types of information into one system (or even one document), and you'll quickly see how problems can arise.
XML namespaces attempt to keep different semantic usages of the same XML elements separate and unambiguous. In our example, each person could define their own namespace and then prepend the name of their namespace to specific tags: <book:title> is different from <broker:title> and <medrec:title>. Namespaces, by the way, are one of the technologies that make XSLT and XSD work.
XHTML
XHTML stands for Extensible Hypertext Markup Language. Technically speaking, it's a reformulation of HTML 4.01 as an application of XML, and is not part of the XML family of technologies. To save your brain from complete meltdown, it might be simplest to think of XHTML as a standard for HTML markup tags that follow all the well-formedness rules of XML we covered earlier.
What's the point of that, you might ask? Well, there are tons and tons and tons of Websites out there that already use HTML. No one in their right mind could reasonably expect them all to switch to XML overnight. But we can expect that some of these pages—and a large percentage of the new pages that are being coded as you read this—will make the transition thanks to XHTML.
As you can see, the XML family of technologies is a pretty big group—those XML family reunions are undoubtedly interesting! It's also important to note that these technologies are open standards-based, which means that any new XML technologies (or proposed changes to existing ones) must follow a public process set down by the W3C (the World Wide Web Consortium) in order to gain acceptance in the community.
Although this means that some ideas take quite a while to reach fruition, and tend to be built by committee, it also means that no single vendor is in total control of XML. And this, as Martha Stewart might say, is a good thing.
A Closer Look at XHTML
Imagine you're at a cocktail party and somebody asks, "Okay, what's XHTML really?" You needed to tell them something (besides, "Hey, I'm trying to have a relaxing cocktail here!"). So, what do you say? Not sure? That's what I thought.
Because this is a book about XML and not XHTML, and because there are plenty of terrific books out there on XHTML, I don't want to get into too much detail about the technology here. However, I do feel that a basic knowledge of XHTML will serve you well, and will help to reinforce the concepts we've already introduced.
So, back to our cocktail party. Here are some answers that you might give in that situation:
- XHTML stands for Extensible HyperText Markup Language.
- XHTML is designed to replace HTML.
- XHTML uses the HTML 4.01 tag set, but is written using the XML syntax rules.
- XHTML is a stricter, cleaner version of HTML.
Why do we need XHTML? Well, put bluntly, the Web has reached a point at which just about anything will fly when it comes to HTML documents. Take a look at the following snippet:
<html><title>My example</title>
<h1>Hello</h1>
Believe it or not, that snippet will render without a problem in most Web browsers. And so will this:
<p><b><i>Hello</b>
So will this:
Hello
I don't want to start some kind of crusade about HTML structure, but hey, enough is enough! Web pages represent structured information, so please, let's at least maintain some semblance of structure! At its most basic, XHTML was designed to form a kind of bridge between the loosy-goosy world of HTML and the more rigid structure of XML.
Remember that list of statements about XHTML we saw a moment ago? Well, here's another way to think about XHTML:
XHTML consists of all HTML 4.01 elements combined with the syntax of XML.
Simple! But, exactly what does this mean? Well, if you recall what we said in Chapter 1, Introduction to XML about well-formed XML documents, you can make some very good guesses:
- XHTML documents must contain a root element that contains all other elements. (In most cases, the
htmlelement!) - XHTML elements must be properly nested.
<p>This is a <b>sentence.</b></p> - All XHTML elements must have closing tags (even empty ones).
<br />
<td></td>
Don't Slash Backwards Compatibility
Older browsers, such as Netscape 4, which do not recognize XML syntax, will become confused by self-closing tags like <br/>. By simply adding a space before the slash (<br />), you can ensure that these browsers will ignore the slash and interpret the tag correctly.
- All XHTML attribute values must be placed between quotes.
<input type="button" name="submit" value="click to finish" /> - All XHTML element and attribute names must be written in lowercase.
<tr valign="top"> - Each XHTML document must have a DOCTYPE declaration at the top.
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
There are three XHTML DOCTYPES:
Strict
Use this with CSS to minimize presentational clutter. In fact, the Strict DOCTYPE expressly prohibits the use of HTML's presentation tags.
<!DOCTYPE html
PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
Transitional
Use this to take advantage of HTML's presentational features and/or when you're supporting non-CSS browsers.
<!DOCTYPE html
PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
Frameset
Use this when you want to use frames to partition the screen.
<!DOCTYPE html
PUBLIC "-//W3C//DTD XHTML 1.0 Frameset//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-frameset.dtd">
A Minimalist XHTML Example
Here's a very simple document that illustrates the rules above:
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<title>A very simple XHTML document</title>
<meta http-equiv="content-type"
content="text/html; charset=iso-8859-1" />
</head>
<body>
<p>a simple paragraph that contains a properly formatted<br />
break and some <b><i>properly nested</i></b> formatting.</p>
<div><img src="myphoto.jpg" alt="notice that all my quotes are in
place for attribute values" /></div>
</body>
</html>
That's more than enough information about XHTML for the moment. Let's move on to discuss namespaces and XSLT.
XML Namespaces
XML Namespaces were invented to rectify a common problem: the collision of documents using identical element names for different data.
Let's revisit our namespace example from this chapter's introduction. Imagine you were running a bookstore and had an inventory file (called inventory.xml, naturally), in which you used a title element to store book titles. Let's also say that—unlikely though it sounds—your XML document becomes mixed in with a mortgage broker's master record file. In this file, the mortgage broker has used title to store information about a property's legal title.
A human being could probably figure out that one title has nothing to do with the other, but an application that tried to sort it out would go nuts. We need to have a way to distinguish between the two different semantic universes in which these identical terms exist.
Let's get even more ambiguous: imagine you had an inventory.xml file in your bookstore that used the title element to store book titles, and a separate sales.xml file that used the title element to store the same information, but in a completely different context. Your inventory file stores information about books on the shelf, but the sales file stores information about books that have been bought by customers.
In either situation, regardless of the chasm that lies between the contexts of these identical terms, we need a way to properly label each context.
Namespaces to the rescue! XML namespaces allow you to create a unique namespace based on a URI (Uniform Resource Identifier), give that namespace a prefix, and apply that prefix to XML document elements.
Declaring Namespaces
To use and declare a namespace, we must first tie the namespace to a URI. Notice that I didn't say URL—a specific location that you can reach (although a URI can be a URL). A URI is simply a unique identifier that distinguishes one thing (say, an XML document standard) from another. URIs can take the following forms:
URL
Uniform Resource Locator: a specific protocol, machine address, and file path (e.g. http://www.tripledogdaremedia.com/index.php).
URN
Uniform Resource Name: a persistent name that doesn't point to an actual location for the resource, but still identifies it uniquely. For example, all published books have an ISBN. The ISBN uniquely identifies the book, but nowhere in the ISBN is there any indication as to which shelf it sits on in any particular bookstore. However, armed with the ISBN, you could walk into the store, ask an employee to search for you, and they could take you right to the book (provided, of course, that it was in stock.
The following are examples of good URIs:
http://www.tripledogdaremedia.com/XML/Namespaces/1
urn:bookstore-inventory-namespace
We want to use our namespace throughout our XML documents, though, and the last thing we want to do is type out an entire URI every time we need to distinguish one context from another. So, we define a prefix to represent our namespace to ease the strain on our typing fingers:
inv="urn:bookstore-inventory-namespace"
But, wait—we're not done yet! We need a way to tell the XML parser that we're creating a namespace. The agreed way to do that is to prefix the namespace declaration with xmlns:, like this:
xmlns:inv="urn:bookstore-inventory-namespace"
At this point, we have something useful. If we needed to, we could add our prefix to appropriate elements to disambiguate (I love that term!) any potentially ambiguous usage, like this:
<inv:title>Build Your Own XML-Powered Web Site</inv:title>
<title>Title Deed to the house on 123 Main St., YourTown</title>
Namespaces make it very clear that <inv:title> is very different from <title>.
But, where do we put our namespace declaration?
Placing Namespace Declarations in your XML Documents
In most cases, placing your namespace declarations will be rather easy. They're commonly located in the root element of a document, like so:
<inventory xmlns:inv="urn:bookstore-inventory-namespace">
…
</inventory>
Please note, however, that namespaces have scope. Namespaces affect the element in which they are declared, as well as all the child elements of that element. In fact, as you'll see when we discuss XSLT later, we'll use the xsl prefix in the very element in which we define the XSL namespace:
<xsl:stylesheet
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns="http://www.w3.org/1999/xhtml"
version="1.0">
Any namespace declaration that's placed in a document's root element becomes available to all elements in that document. However, if you want to limit your namespace scope to a certain part of a document, feel free to do so—remembering, of course, that this can get pretty tricky. My advice is to declare your namespaces in the document's root element, then use the prefixes when you need them.
Using Default Namespaces
It would become pretty tiresome to have to type a prefix for every single element in a document. Fortunately, you can declare a default namespace that doesn't contain a prefix. This namespace will apply to all elements that don't contain prefixes.
Let's take another look at a typical opening <xsl:stylesheet> tag for an XSLT file:
<xsl:stylesheet
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns="http://www.w3.org/1999/xhtml"
version="1.0">
Notice the non-prefixed namespace: xmlns="http://www.w3.org/1999/xhtml" In an XSLT file, this namespace governs all elements that aren't specifically prefixed as XSLT elements, identifying them as XHTML tags. On the other side of the coin, all XSLT elements must be given the xsl: prefix.
Using CSS to Display XML In a Browser
The most powerful tools available for displaying XML in a browser are XSLT and Cascading Style Sheets (CSS). Because XSLT can be quite a tricky undertaking for newbies, I've decided to let you practice with CSS first!
The first step in working with CSS is to create a basic XML file:
Example 2.1. letter.xml (excerpt)
<?xml version="1.0"?>
<letter>
<to>Mom</to>
<from>Tom</from>
<message>Happy Mother's Day</message>
</letter>
As XML documents go, this one could be made a lot simpler, but there's no point in making things too simple. This document contains a root element (letter) that contains three other elements (to, from, and message), each of which contains text.
Now, we need to add a style sheet declaration that will point to the CSS document we'll create. To associate a CSS style sheet with an XML file, use the <?xml-stylesheet?>directive:
Example 2.2. letter-css.xml (excerpt)
<?xml-stylesheet type="text/css" href="letter.css"?>
Finally, we write our CSS file, making sure that we provide a style for each element in our XML file:
Example 2.3. letter.css
letter {
display: block;
margin: 10px;
padding: 5px;
width: 300px;
height: 100px;
border: 1px solid #00000;
overflow: auto;
background-color: #cccccc;
font: 12px Arial;
}
to, from {
display: block;
font-weight: bold;
}
message {
display: block;
font: 11px Arial;
}
When you display your XML document, you should see something similar to Figure 2.1, "Viewing the CSS results in Internet Explorer.".
Figure 2.1. Viewing the CSS results in Internet Explorer.

View larger image.
As you can see, CSS did a marvelous job of rendering a nicely shaded box around the entire letter, setting fonts, and even displaying things like margins and padding. What it didn't allow us to do, however, was add text to the output. For instance, we could use a "To:" in front of whatever text was in the to element. If you want to have that kind of power, you'll need to use XSLT. Strictly speaking, the CSS standard does allow for this sort of thing with the content property, which can produce generated text before and after document elements. Many browsers do not support this property, however, and even those that do don't provide anywhere near the flexibility of XSLT.
Getting to Know XSLT
XSLT, as I mentioned earlier in the chapter, stands for Extensible Stylesheet Language Transformations. Think of it as a tool that you can use to transform your XML documents into other documents. Here are some of the possibilities:
- Transform XML into HTML or raw ASCII text.
- Transform XML into other dialects of XML.
- Pull out all the passages tagged as Spanish, or French, or German to create foreign-language versions of your XML document.
Not bad—and we've barely scratched the surface!
XSLT is a rules-based, or functional language. It's not like other programming languages (e.g. PHP or JSP) that are procedural or object-oriented. Instead, XSLT requires that you supply a series of rules (called "templates") that tell it what to do when it encounters the various elements of an XML document.
For instance, upon identifying an XML <para> tag in the input document, a rule could instruct XSLT to convert it into an HTML <p> tag.
Because XSLT can be a little bewildering even for veteran programmers, the best way to tackle it is to walk through a series of examples. That way, I can give you the practical information you'll need to get started, and you can learn the key concepts along the way. As with XHTML, countless books, articles, and Websites are devoted to XSLT; use these to continue your education.
Your First XSLT Exercise
Let's get started with XSLT. For our first exercise, we'll reuse the very simple Letter to Mother example we saw in the CSS section. We'll also create a very basic Extensible Stylesheet Language (XSL) file to transform that XML. Keeping both these elements simple will give us the opportunity to step through the major concepts involved.
First, let's create the XSL file. This file will contain all the instructions we'll need in order to transform the XML elements into raw text.
In what will become a recurring theme in the world of XML, XSL files are in fact XML files in their own right. They must therefore follow the rules that apply to all XML documents: an XSL file must contain a root element, all attribute values must be quoted, and so on.
All XSL documents begin with a stylesheet element This element contains information that the XSLT processor needs to do its job:
Example 2.4. letter2text.xsl (excerpt)
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
The version attribute is required. In most cases, you'd use 1.0, as this is the most widely supported version at the time of this writing.
The xmlns:xsl attribute is used to declare an XML namespace with the prefix xsl. For your stylesheet transformation to work at all, you must declare an XML namespace for the URI http://www.w3.org/1999/XSL/Transform in your opening <stylesheet> tag. In our example, we will use an xsl prefix on all the stylesheet-related tags in our XSL documents to associate them with this namespace. You'll find this is common practice when working with XSLT.
The next element will be the output element, which is used to define the type of output you want from the XSL file. For this first example, we'll use text as our method:
Example 2.5. letter2text.xsl (excerpt)
<xsl:output method="text"/>
Other possible values for the method attribute include html and xml, but we'll cover those a little later.
Now we come to the heart of XSLT—the template and apply-templates elements. Together, these two elements make the transformations happen.
Put simply, the XSLT processor (for our immediate purposes, the browser) starts reading the input document, looking for elements that match any of the template elements in our style sheet. When one is found, the contents of the corresponding template element tells the processor what to output before continuing its search. Where a template contains an apply-templates element, the XSLT processor will search for XML elements contained within the current element and apply templates associated with them.
There are some exceptions and additional complications that we'll see as we move forward, but for now, that's really all there is to it.
The first thing we want to do is match the letter element that contains the rest of our document. This is fairly straightforward:
Example 2.6. letter2text.xsl (excerpt)
<xsl:template match="/letter">
<xsl:apply-templates select="*"/>
</xsl:template>
This very simple batch of XSLT simply states: "when you encounter a letter element at the root of the document, apply any templates associated with the elements it contains." Let's break this down.
The <xsl:template> tag is used to create a template, with the match attribute indicating which element(s) it should match. The value of this attribute is an XPath expression (we'll learn more about XPath later). In this case, the /letter value indicates that the template should match the letter elements at the root of the document. Were the value simply letter, the template would match letter elements throughout the document.
Now, this <xsl:template> tag contains only an <xsl:apply-templates> tag, which means that it doesn't actually output anything itself. Rather, the <xsl:apply-templates> tag sends the processor looking for other elements with matching templates.
By default, apply-templates will match not only elements, but text and even whitespace between the elements as well. XSLT processors have a set of default, or implicit templates, one of which simply outputs any text or whitespace it encounters. Since we want to ignore any text or whitespace that appears between the tags inside <letter>, we use the select attribute of apply-templates to tell the processor to look for child elements only in its search. We do this with another XPath expression: * means "all child elements of the current element."
Now, we've got our processor looking for elements inside letter, so we'd better give it some templates to match them!
Example 2.7. letter2text.xsl (excerpt)
<xsl:template match="to">
TO: <xsl:apply-templates/>
</xsl:template>
<xsl:template match="from">
FROM: <xsl:apply-templates/>
</xsl:template>
<xsl:template match="message">
MESSAGE: <xsl:apply-templates/>
</xsl:template>
Each of these templates matches one of the elements we expect to find inside the letter element: to, from, and message. In each case, we output a text label (e.g. TO:) and then use apply-templates to output the contents of the tag (remember, in the absence of a select attribute that says otherwise, apply-templates will output any text contained in the tags automatically).
The last thing we have to do in the XSL file is close off the stylesheet element that began the file:
</xsl:stylesheet>
Our style sheet now looks like this:
Example 2.8. letter2text.xsl (excerpt)
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="text"/>
<xsl:template match="/letter">
<xsl:apply-templates select="*"/>
</xsl:template>
<xsl:template match="to">
TO: <xsl:apply-templates/>
</xsl:template>
<xsl:template match="from">
FROM: <xsl:apply-templates/>
</xsl:template>
<xsl:template match="message">
MESSAGE: <xsl:apply-templates/>
</xsl:template>
</xsl:stylesheet>
While the logic of this style sheet is complete and correct, there's a slight formatting issue left to be tackled. Left this way, the output would look something like this:
TO: Mom
FROM: Tom
MESSAGE: Happy Mother's Day
There's an extraneous line break at the top of the file, and each of the lines begins with some unwanted whitespace. The line break and whitespace is actually coming from the way we've formatted the code in the style sheet. Each of our three main templates begins with a line break and then some whitespace before the label, which is being carried through to the output.
But wait—what about the line break and whitespace that ends each template? Why isn't that getting carried through to the output? Well by default, the XSLT standard mandates that whenever there in only whitespace (including line breaks) between two tags, the whitespace should be ignored. But when there is text between two tags (e.g. TO:), then the whitespace in and around that text should be passed along to the output.
Avoid Whitespace Insanity
The vast majority of XML books and tutorials out there completely ignore these whitespace treatment issues. And while it's true that whitespace doesn't matter a lot of the time when you're dealing exclusively with XML documents (as opposed to formatted text output), it's likely to sneak up on you and bite you in the butt eventually. Best to get a good grasp of it now, rather than waiting for insanity to set in when you least expect it.
The <xsl:text> tag is useful for controlling the effects of whitespace in our style sheets. All it does is output the text it contains, even if it is just whitespace. Here's the adjusted version of our style sheet, with <xsl:text> tags used to isolate text we want to output:
Example 2.9. letter2text.xsl
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="text"/>
<xsl:template match="/letter">
<xsl:apply-templates select="*"/>
</xsl:template>
<xsl:template match="to">
<xsl:text>TO: </xsl:text>
<xsl:apply-templates/>
<xsl:text>
</xsl:text>
</xsl:template>
<xsl:template match="from">
<xsl:text>FROM: </xsl:text>
<xsl:apply-templates/>
<xsl:text>
</xsl:text>
</xsl:template>
<xsl:template match="message">
<xsl:text>MESSAGE: </xsl:text>
<xsl:apply-templates/>
<xsl:text>
</xsl:text>
</xsl:template>
</xsl:stylesheet>
Notice how each template now outputs its label (e.g. TO:) followed by a single space, then finishes off with a line break. All the other whitespace in the style sheet is ignored, since it isn't mixed with text. This gives us the fine control over formatting that we need when outputting a plain text file.
Are we done yet? Not quite. We have to go back and add to our XML document a style sheet declaration that will point to our XSL file, just like we did for the CSS example. Simply open the XML document and insert the following line before the opening <letter> element:
Example 2.10. letter-text.xml (excerpt)
<?xml-stylesheet type="text/xsl" href="letter2text.xsl"
version="1.0"?>
Now, our XML document looks like this:
Example 2.11. letter-text.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="letter2text.xsl"
version="1.0"?>
<letter>
<to>Mom</to>
<from>Tom</from>
<message>Happy Mother's Day</message>
</letter>
When you view the XML document in Firefox, you should see something similar to the result pictured in Figure 2.2, "Viewing XSL results in Firefox.". You can try viewing this in Internet Explorer as well, but you won't see the careful text formatting we applied in our style sheet. Internet Explorer interprets the result as HTML code, even when the style sheet clearly specifies that it will output text. As a result, whitespace is collapsed and our whole document appears on one line.
Figure 2.2. Viewing XSL results in Firefox.

View larger image.
If you're curious, go ahead and view the source of this document. You'll notice that you won't see the output of the transformation (technically referred to as the result tree), but you can see the XML document source.
What About my Favorite Browser?
If you don't use Firefox on a regular basis, you might be a little miffed that I've started out with an example that works only in Mozilla-based browsers.
First of all, if you prefer Internet Explorer, the situation will improve with the next example, which conforms to Internet Explorer's assumption that the result of a transformation must be HTML, not plain text as it was in this example.
As for the other browsers in popular use, including Safari and Opera, these do not yet support XSLT. For this reason, it is not yet practical to rely on browser support for XSLT in a real-world website. As we'll learn, it is far more sensible to use XSLT on the server side, where it is safe from browser incompatibilities.
For now, however, the solid XSLT capabilities built into Firefox (and to a lesser degree, Internet Explorer) provide a convenient means to learn what XSLT is capable of.
Transforming XML into HTML
That wasn't so bad, was it? You successfully transformed a simple XML document into flat ASCII text, and even added a few extra tidbits to the output.
Now, it's time to make things a little more complex. Let's transform the XML document into HTML. Here's the great part—you won't have to touch the original XML document (aside from pointing it at a new style sheet, that is). All you'll need to do is create a new XSL file:
Example 2.12. letter2html.xsl
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="html"/>
<xsl:template match="/letter">
<html>
<head><title>Letter</title></head>
<body><xsl:apply-templates/></body>
</html>
</xsl:template>
<xsl:template match="to">
<b>TO: </b><xsl:apply-templates/><br/>
</xsl:template>
<xsl:template match="from">
<b>FROM: </b><xsl:apply-templates/><br/>
</xsl:template>
<xsl:template match="message">
<b>MESSAGE: </b><xsl:apply-templates/><br/>
</xsl:template>
</xsl:stylesheet>
Right away, you'll notice that the style sheet's output element now specifies an output method of html. Additionally, our first template now outputs the basic tags to produce the framework of an HTML document, and doesn't bother suppressing the whitespace in the source document with a select attribute.
Other than that, these instructions don't differ much from our text-only style sheet. In fact, the only other changes we've made have been to tag the label for each line to be bold, and end each line with an HTML line break (<br/>). We no longer need the <xsl:text> tags, since our HTML <b> and <br/> tags perform the same function. Note the space following each label, which is inside the <b> tag so that it won't be ignored by the processor.
All we have to do now is edit our XML file to make sure that the <?xml-stylesheet?> instruction references our new style sheet (letter-html.xml in the code archive), and we're ready to display the results in a Web browser.
You should see something similar to Figure 2.3, "Viewing XSL Results in Internet Explorer.".
Figure 2.3. Viewing XSL Results in Internet Explorer.

View larger image.
Using XSLT to Transform XML into other XML
What happens if you need to transform your own XML document into an XML document that meets the needs of another organization or person? For instance, what if our letter document, which uses <to>, <from>, and <message> tags inside a <letter> tag, needed to have different names, say <recipient>, <sender>, and <body>?
Not to worry—XSLT will save the day! And, as with the two previous examples, we don't even need to worry about changing the source XML document. All we have to do is create a new XSL file, and we're set.
As before, we'll open with the standard stylesheet element, but, this time, we'll choose xml as our output method. We're also going to instruct XSLT to indent the resulting XML:
Example 2.13. letter2xml.xsl (excerpt)
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" indent="yes"/>
The <template> elements are structured as before, but this time they output the new XML elements:
Example 2.14. letter2xml.xsl (excerpt)
<xsl:template match="/letter">
<letter><xsl:apply-templates/></letter>
</xsl:template>
<xsl:template match="to">
<recipient><xsl:apply-templates/></recipient>
</xsl:template>
<xsl:template match="from">
<sender><xsl:apply-templates/></sender>
</xsl:template>
<xsl:template match="message">
<body><xsl:apply-templates/></body>
</xsl:template>
</xsl:stylesheet>
Now, all you have to do is edit your XML document to point to the style sheet, and you'll be able to view your new XML in any Web browser, right? Wrong! You see, Web browsers only supply collapsible tree formatting for XML documents without style sheets. XML documents that result from a style sheet transformation are displayed without any styling at all, or at best are treated as HTML—not at all the desired result.
Where the browser can be useful for viewing XML output is when that XML is an XHTML document—which browsers obviously can display. There are several things that need to be added to your style sheet to signal to the browser that the document is more than a plain XML file, though. The first is the XHTML namespace:
Example 2.15. letter2xhtml.xsl (excerpt)
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns="http://www.w3.org/1999/xhtml">
Here we have declared a default namespace for tags without prefixes in the style sheet. Thus tags like <html> and <b> will be correctly identified as XHTML tags.
Next up, we can flesh out the output element to more fully describe the output document type:
Example 2.16. letter2xhtml.xsl (excerpt)
<xsl:output method="xml" indent="yes" omit-xml-declaration="yes"
media-type="application/xhtml+xml" encoding="iso-8859-1"
doctype-public="-//W3C//DTD XHTML 1.0 Transitional//EN"
doctype-system=
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"/>
In addition to the method and indent attributes, we have specified a number of new attributes here:
omit-xml-declaration
This tells the processor not to add a <?xml?> declaration to the top of the output document. Internet Explorer for Windows displays XHTML documents in Quirks Mode when this declaration is present, so by omitting it we can ensure that this browser will display it in the more desirable Standards Compliance mode.
media-type
Though not required by current browsers, setting this attribute to application/xhtml+xml offers another way for the browser to identify the output as an XHTML document, rather than plain XML.
encoding
Sets the character encoding of the output document, controlling which characters are escaped as character references (&xnn;).
doctype-public, doctype-system
Together, these two attributes provide the values needed to generate the DOCTYPE declaration for the output document. In this example, we've specified values for an XHTML 1.0 Transitional document, but you could also specify an XHTML 1.0 Strict document if that's what you need:
<xsl:output method="xml" indent="yes" omit-xml-declaration="yes"
media-type="application/xhtml+xml" encoding="iso-8859-1"
doctype-public="-//W3C//DTD XHTML 1.0 Strict//EN"
doctype-system=
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"/>
The rest of the style sheet is as it was for the HTML output example we saw above. Here's the complete style sheet so you don't have to go searching:
Example 2.17. letter2xhtml.xsl
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns="http://www.w3.org/1999/xhtml">
<xsl:output method="xml" indent="yes" omit-xml-declaration="yes"
media-type="application/xhtml+xml" encoding="iso-8859-1"
doctype-public="-//W3C//DTD XHTML 1.0 Transitional//EN"
doctype-system=
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"/>
<xsl:template match="/letter">
<html>
<head><title>Letter</title></head>
<body><xsl:apply-templates/></body>
</html>
</xsl:template>
<xsl:template match="to">
<b>TO: </b><xsl:apply-templates/><br/>
</xsl:template>
<xsl:template match="from">
<b>FROM: </b><xsl:apply-templates/><br/>
</xsl:template>
<xsl:template match="message">
<b>MESSAGE: </b><xsl:apply-templates/><br/>
</xsl:template>
</xsl:stylesheet>
Point the <?xml-stylesheet?> processing instruction in your XML document at this style sheet and then load it in Firefox or Internet Explorer. You should see the output displayed as an XHTML document.
So yes, if the XML you are generating happens to be XHTML, a browser can display it just fine. Otherwise, what we need to display XML output is some kind of standalone XSLT processor that we can run instead of a Web browser… but, guess what? We've run out of space to talk about XSLT in this chapter. We'll pick up this discussion in Chapter 4, Displaying XML in a Browser.
Our CMS Project
In Chapter 1, Introduction to XML, we did quite a bit of work to analyze the article content type. Now, we need to identify exactly what we need for our news items, binary files, and Web copy. We must also manage and track site administrators using XML. By the time we get to the end of this chapter, we'll be roughly two-thirds the way through the requirements-gathering phase. Don't worry, though—time spent in this part of the process will pay off in a big way when we start development.
News
Compared to our article content type, news will be fairly straightforward. We will need to track these pieces of information:
- Unique identifier
- Headline
- Author
- Short description
- Publication date
- Status
- Keywords
- URL for more information
Everything else should look just like the article content type, except that we won't allow HTML tags inside our description. Here's what a typical news item would look like:
<news id="123">
<headline>New XML application being built</headline>
<author>Tom Myer</author>
<description>A new XML application is now finally being released
by …</description>
<pubdate>2004-01-20</pubdate>
<status>live</status>
<keywords>XML</keywords>
<url>http://www.yahoo.com/</url>
</news>
From a programmatic standpoint, we will only display news pieces with a "live" status.
Web Copy
Many of our site's Web pages, including the homepage, will display copy of some form, be it the contact details for our company, or a description of the services we can provide. If we built a CMS that didn't allow us to manage this copy, we wouldn't have a proper CMS, would we?
The easiest way to keep track of copy is to treat each piece a little like an article. In fact, Web copy has many of the same characteristics as your standard articles, except that we generally don't need to track authors. An XML document that tracks a piece of Web copy will look like this:
<webcopy id="123">
<navigationlabel>XML CMS</navigationlabel>
<headline>XML-powered CMS Solutions</headline>
<description>Learn about our XML-powered CMS products.
</description>
<pubdate>2004-01-20</pubdate>
<status>live</status>
<keywords>XML CMS</keywords>
<body>[CDATA[
<h1>Creating an XML-powered CMS</h1>
<p>Are you tired of waiting around for your "IT Guy" or
expensive designer to update your web site? Well, those
days will be long forgotten if you buy our XML-powered CMS!
With this revolutionary new tool, you can make quick and
easy updates to your own web site! Forget all the hassles!
It slices, it dices!</p>
]]</body>
</webcopy>
The <keywords> and <status> elements will work in much the same way as they do for articles and news pieces.
Administrators
Our final content type isn't really a content type—it's more of a supporting type. We will need to keep track of each administrator on the site, as these are the folks who can log in and make changes to advertisement copy, articles, news pieces, and binary files.
We will need to record each administrator's name, username, password (encrypted, of course), and email address. For the moment, we won't worry about exactly how the password is encrypted—we'll talk about that later.
Example 2.18. admin.xml
<?xml version="1.0" encoding="iso-8859-1"?>
<admins>
<admin id="1">
<name>Joe</name>
<username>joe</username>
<password>$1$064.HQ..$x912OhlIlHFylTPJmJR/k/</password>
<email>joe@myerman.com</email>
</admin>
<admin id="2">
<name>Bill</name>
<username>bill</username>
<password>$1$Ep5.7h4.$R6iGqy.Wj2Dz8SAE9WG3l0</password>
<email>bill@myerman.com</email>
</admin>
<admin id="3">
<name>Tom</name>
<username>tom</username>
<password>$1$Cl/.j3..$QcjxGtxqYx0VNp3QanGnP0</password>
<email>tom@myerman.com</email>
</admin>
</admins>
As with each article/news item/binary file/advertisement copy item, each administrator will need a unique ID—otherwise, the system may not know who's trying to log in.
Summary
We covered a lot in this chapter—I'm glad you're still with me! In Chapter 3, DTDs for Consistency, we're going to dig around inside DTDs and XML Schemas. And, in the CMS section, we'll take a look at an alternative approach to handling status, keyword, and author listings—I think you'll really like the way we change things around. After that, you should have enough of a working knowledge of XML (and its wacky family) to really start development.