Article

Home » Server-side Coding » PHP & MySQL Tutorials » Instant XML with PHP and PEAR::XML_Serializer

About the Author

Harry Fuecks

author_HarryF Harry has been working in corporate IT since 1994, with everything from start-ups to Fortune 100 companies. Outside of office hours he runs phpPatterns: a site dedicated to software design with PHP that aims to raise standards of PHP development. He also maintains Dynamically Typed: SitePoint's PHP blog.

View all articles by Harry Fuecks...

Instant XML with PHP and PEAR::XML_Serializer

By Harry Fuecks

May 17th, 2004

Reader Rating: 9

Page: 1 2 3 4 5 Next

These days, XML has become part of landscape in most all areas of software development -- none more so than on the Web. Those using common XML applications, such as RSS and XML-RPC, will probably find public domain libraries geared specifically to help them work with the formats, eliminating the need for wheel re-invention.

But for "ad-hoc" XML documents, you may be on your own, and you may well wind up spending valuable time building code to parse it. You may also find yourself needing to expose data as XML, in order to make it available to some other system or application, and while XML, in the end, is just text, generating a document that obeys XML's rules for well-formedness can be trickier than it seems. Enter: PEAR::XML_Serializer, the "Swiss Army Knife" for XML.

If you stay in touch with SitePoint, you've already had a taste of PEAR::XML_Serializer while reading Getting Started with PEAR. In this article, I'll be looking at XML_Serializer in depth and showing you how it can make working with XML a snap. If you're in any doubt about XML in general, try an Introduction to XML.

Today's tag hierarchy:

  • Introduction: what PEAR::XML_Serializer does and how to install it
  • The XML_Serializer API: overview of the serialization class with simple examples
  • The XML_Unserializer API: overview of the unserialization class with more examples
  • Managing Configuration Information: PEAR::XML_Serializer applied to manage an XML configuration file
  • Web Services with PEAR::XML_Serializer: System to system data exchange

Note that the version of PEAR::XML_Serializer used for this article was 0.91. To make life easy, I've saved all the code from these examples into an archive you can download here.

Introduction

PEAR::XML_Serializer is a result of the hard work of Stephan Schmidt, one of Germany's most prolific PHP developers. There's a reasonable chance you've already run into a PHP project that Stephan has worked on, if you've ever looked at PHP Application Tools (PAT) (as in patTemplate, patUser and many more). In fact, if you look at the publications and presentations, you may find yourself wondering if Stephan has somehow managed to clone himself.

PEAR::XML_Serializer works on the principle that XML can be represented as native PHP types (variables). In other words, you can build some array in PHP, pass it to XML_Serializer, and it will give you back an XML document that represents the array. It's also capable of the reverse transformation -- give it an XML document and it can unserialize it for you, returning a PHP data structure representing the document.

The magic behind the scenes is PHP's reflection functions, such as is_array(), get_class() and get_object_vars(). According to Wikipedia, "reflection is the ability of a program to examine and possibly modify its high level structure at runtime".

What this means is that PEAR::XML_Serializer, given some arbitrary PHP data structure, does a pretty good job of turning it into a useful XML representation and vice versa. Of course, this is based on "guesswork" and you may find the resulting transformations aren't always quite what you expected. To give you more control, PEAR::XML_Serializer has a number of runtime options that affect how it makes transformations. I'll look at the class APIs and summarize the available options in a moment.

Some likely problems to which you might apply PEAR::XML_Serializer include managing application configuration with an XML document (config.xml), building REST-based Web services, storing data in XML for later recall by your applications, general system-to-system data exchange and pretty much any "quick and dirty" parsing you need to do at short notice.

Where you might want to avoid using PEAR::XML_Serializer is in parsing large XML documents (in the order of megabytes), or when you're dealing with complex, possibility arbitrary XML documents (such as XHTML). Like the DOM API, XML_Serializer parses the entire XML document and builds a PHP data structure from it, in memory. Large documents may result in you hitting PHP's memory_limit (see php.ini), and operations like looping though the data structure will be expensive. PHP's native SAX parser is generally a better choice in such cases, allowing you to work with small "chunks" and keep memory use under control.

Meanwhile, for documents such as XHTML, the API PEAR::XML_Serializer is too simplistic to give you the degree of fine grained control you'll require. Once you're familiar with how to use it, try unserializing SitePoint's homepage, or generating XHTML by serializing a PHP data structure, and you'll quickly see what I mean. DOM is generally a better choice for manipulating XML.

To use PEAR::XML_Serializer, you also need to have PEAR::XML_Parser (a wrapper on PHP's SAX extension) and PEAR::XML_Util (provides a number of handy methods for working with XML) installed. XML_Parser is frequently installed with PEAR itself but, assuming you have neither, type the following, from the command line, to get everything installed:

$ pear install XML_Parser
$ pear install XML_Util
$ pear install XML_Serializer

Of course this assumes you have PEAR installed -- see Getting Started with PEAR for instructions on installing PEAR.

PEAR::XML_Serializer provides two APIs with the classes XML_Serializer and XML_Unserializer. The first, XML_Serializer, is used to transform PHP data structures into XML, while XML_Unserializer performs the reverse operation, transforming XML into a PHP data structure. In both cases, only a few public class methods are exposed, making simple transformations quick coding. Further control over the behaviour of the classes requires setting "options", typically by passing an associative PHP array to the constructor of the class you're working with. I have to confess I'm less than enamoured with handling configuration this way, as I've blogged before here, and PEAR::XML_Serializer perhaps proves the point; finding the supported options requires trawling the source code (to make your life easy, a complete list is coming right up). Anyway, griping aside, PEAR::XML_Serializer remains an excellent tool for working with XML.

The XML_Serializer API

I'll begin with the XML_Serializer class, used to transform PHP data structures into XML, first summarizing the API, then illustrating with some basic examples. To describe the API, I'll be using the function signature notion common to the PHP manual:

return_type function_name(type param_name, [type optional_param_name])

The main public methods available from the XML_Serializer class are:

  • object XML_Serializer([array options]) The constructor accepts an optional array of options (see below).
  • mixed serialize(mixed data, [array options]) Pass this method a PHP data structure and it performs the serialization into XML. The returned value is either TRUE on success, or a PEAR Error object if problems were encountered. Further options can be also be passed as a second argument (see below).
  • mixed getSerializedData() This method returns the serialized XML document as a string, or as a PEAR error object if there's no serialized XML available.
  • void setOption(string name, mixed value) This method sets an individual option.
  • void resetOptions() Use this method to reset all options to their default states.

The available options for XML_Serializer are:

  • addDecl (default = FALSE): whether to add opening XML processing instruction, <?xml version="1.0"?>
  • encoding (default = ""): the XML character encoding that will be added to the opening XML declaration e.g. <?xml version="1.0" encoding="ISO-8859-1"?>
  • addDoctype (default = FALSE): whether to add a DOCTYPE declaration to the document
  • doctype (default = null): specify the URIs to be used in the DOCTYPE declaration (see examples below)
  • indent (default = ""): a string used to indent the XML tags, to make it friendlier to the human eye
  • linebreak (default = "\n"): also used for formatting, this character being inserted after each opening and closing tag
  • indentAttributes (default = FALSE): a string used to indent the attributes of generated XML tags. If set to the special value, "_auto", it will line up all the attributes below the same column, inserting a linefeed character between each attribute.
  • defaultTagName (default = "XML_Serializer_Tag"): the tag name used to serialize the values in an indexed array
  • mode (default = "default"): if set to 'simplexml', the elements of indexed arrays will be placed in tags with the same name as their parent. More on this below.
  • rootName (default = ""): The tag to assign to the root tag of the XML document. If not specified, the type of the root element in the PHP data structure will be used for the root name (e.g. "array").
  • rootAttributes (default = array()): an associative array of values to be transformed into the attributes of the root tag, the keys becoming the attribute names. Be careful when using this, as it's your responsibility to make sure the keys and values will make legal XML attributes.
  • scalarAsAttributes (default = FALSE): for associative arrays, if the values are scalar types (e.g. strings, integers), they will be assigned to their parent node as attributes, using the array key as the attribute name.
  • prependAttributes (default = ""): a string to be prepended to the names of any generated tag attributes.
  • typeHints (default = FALSE): determines whether the original variable type of the PHP value that a tag represents should be stored as an attribute in the serialized XML document. See below for an example.
  • typeAttribute (default = "_type"): if typeHints are being used, the types will be stored in the XML tag using an attribute with the name of this option. If you have a PHP variable like $myVariable = 'Hello World!', the default serialized XML representation would be <myVariable _type="string">Hello World!</myVariable> if typeHints are being used.
  • keyAttribute (default = "_originalKey"): attribute used to store the original key of indexed array elements. Used only when typeHints are on.
  • classAttribute (default = "_class"): when serializing objects (with typeHints on), this attribute will be used to store the name of the class the object was created from.

One further special option exists. 'overrideOptions' is used when passing options to the serialize() method. If assigned the value 'TRUE', the options passed to the constructor will be ignored in favour of the default option values and any further options passed to the serialize() method.

A simple example of serializing a PHP data structure with XML_Serializer is as follows:

<?php
// Set error reporting to ignore notices
error_reporting(E_ALL ^ E_NOTICE);

// Include XML_Serializer
require_once 'XML/Serializer.php';

// Some data to transform
$palette = array('red', 'green', 'blue');

// An array of serializer options
$serializer_options = array (
   'addDecl' => TRUE,
   'encoding' => 'ISO-8859-1',
   'indent' => '  ',
   'rootName' => 'palette',
   'defaultTagName' => 'color',
);

// Instantiate the serializer with the options
$Serializer = &new XML_Serializer($serializer_options);

// Serialize the data structure
$status = $Serializer->serialize($palette);

// Check whether serialization worked
if (PEAR::isError($status)) {
   die($status->getMessage());
}

// Display the XML document
header('Content-type: text/xml');
echo $Serializer->getSerializedData();
?>

Filename: palette1.php

You can see here how the options are typically used. I need to build an array, $serializer_options, and pass it to the constructor of XML_Serializer.

Note that changing the error reporting is a requirement if you usually work with full error reporting turned on. The current version of PEAR::XML_Serializer throws PHP error notices like "array to string conversion", none of which is serious, but will result in error notice messages.

The resulting XML looks like this:

<?xml version="1.0" encoding="ISO-8859-1"?>
<palette>
 <color>red</color>
 <color>green</color>
 <color>blue</color>
</palette>

Because the data structure is an indexed array, I used the 'defaultTagName' option to give a name to the tags representing the elements of the array.

Now, let's use an associative array instead:

<?php
// Set error reporting to ignore notices
error_reporting(E_ALL ^ E_NOTICE);

// Include XML_Serializer
require_once 'XML/Serializer.php';

// Some data to transform
$palette = array(
   'red' => 45,
   'green' => 240,
   'blue' => 120
   );

// An array of serializer options
$serializer_options = array (
   'addDecl' => TRUE,
   'encoding' => 'ISO-8859-1',
   'indent' => '  ',
   'rootName' => 'palette',
);

// Instantiate the serializer with the options
$Serializer = &new XML_Serializer($serializer_options);

// Serialize the data structure
$status = $Serializer->serialize($palette);

// Check whether serialization worked
if (PEAR::isError($status)) {
   die($status->getMessage());
}

// Display the XML document
header('Content-type: text/xml');
echo $Serializer->getSerializedData();
?>

Filename: palette2.php

If you liked this article, share the love:
Print-Friendly Version Suggest an Article