Article

PHP and XML: Parsing RSS 1.0

Page: 1 2 3 4 5 Next

XML is springing up all over the Internet as a means to create standard data formats for the exchange of information between systems, irrelevant of their platform or technology. As you may already know, XML allows you to define your own custom markup languages similar to HTML and suited to whatever data you need to represent. A number of standard XML-based markup languages have been created to facilitate the exchange of common types of information. In this article, we'll learn how to use PHP to read an XML document and display the data it contains as a Web page. The example we'll use is a Resource Description Framework (RDF) Site Summary (RSS) 1.0 document, although the techniques presented here apply to any situation where you wish to parse XML data in a PHP script.

A Brief Tour Of RSS 1.0

RSS (previously stood for Rich Site Summary developed by Netscape, but now refers to RDF Site Summary, an updated and XML-compliant version of the Netscape technology) is an XML document format intended to describe, summarize, and distribute the contents of a Web site as a 'channel'. Sites such as MoreOver.com and O'Reilly's Meerkat process RSS feeds provided by news and other content sites and provide combined headline newsfeed services. RSS is currently developed by the RSS-DEV Working Group.

As with most XML document formats, the meaning of the document can be gleaned fairly easily simply by looking over a sample document. SitePoint.com provides summaries of its front-page articles in RSS format at http://www.sitepoint.com/rss.php. If you are using Internet Explorer 5 or later, you can view the current version of this XML document directly in your browser. For everyone else, here is the current SitePoint.com RSS file at the time of this writing:

<?xml version="1.0" encoding="utf-8"?>
<rdf:RDF
 xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
 xmlns="http://purl.org/rss/1.0/">

 <channel rdf:about="http://www.sitepoint.com/rss.php">
   <title>SitePoint.com</title>
   <description>Master the Web!</description>
   <link>http://www.sitepoint.com/</link>

   <items>
     <rdf:Seq>
       <rdf:li rdf:resource="http://www.PromotionBase.com/article/551"/>
       <rdf:li rdf:resource="http://www.WebmasterBase.com/article/541"/>
       <rdf:li rdf:resource="http://www.eCommerceBase.com/article/552"/>
       <rdf:li rdf:resource="http://www.eCommerceBase.com/article/505"/>
       <rdf:li rdf:resource="http://www.PromotionBase.com/article/556"/>
       <rdf:li rdf:resource="http://www.eCommerceBase.com/article/508"/>
     </rdf:Seq>
   </items>
 </channel>
   
 <item rdf:about="http://www.PromotionBase.com/article/551">
   <title>Escape Search Engine Caching</title>
   <description>Did you know that many search engines cache your pages?
   While this practice can speed up a search, users might not see your
   most recent site updates! Ralph shows how you can stop search engines
   caching your pages.</description>
   <link>http://www.PromotionBase.com/article/551</link>
 </item>

 <item rdf:about="http://www.WebmasterBase.com/article/541">
   <title>Add JavaScript to Fireworks</title>
   <description>Does your design need more pizazz? Add interactivity to
   your site without learning JavaScript! Matt explains the creation of
   JavaScript effects in Fireworks, and explores in detail the use of
   this program's tools.</description>
   <link>http://www.WebmasterBase.com/article/541</link>
 </item>

 <item rdf:about="http://www.eCommerceBase.com/article/552">
   <title>eMail Campaigns in 8 Steps - Part 2</title>
   <description>Ok, so you've reeled in your prospects and they're on
   your mailing list. Now what? How do you communicate effectively, and
   turn them into customers? Jason reveals all...</description>
   <link>http://www.eCommerceBase.com/article/552</link>
 </item>

 <item rdf:about="http://www.eCommerceBase.com/article/505">
   <title>The Need for a Written Website Contract</title>
   <description>A written agreement is essential if you pay others to
   design, build or maintain your Websites. Ivan explains the necessity
   of contracts to those who work on the Web.</description>
   <link>http://www.eCommerceBase.com/article/505</link>
 </item>

 <item rdf:about="http://www.PromotionBase.com/article/556">
   <title>Search Engine Strategies 2001 - Conference Report</title>
   <description>Sinewave Interactive's Gavin Appel talks to Matt about
   this year's Search Engine Strategies conference. He outlines the
   discussions and predictions of industry leaders.</description>
   <link>http://www.PromotionBase.com/article/556</link>
 </item>

 <item rdf:about="http://www.eCommerceBase.com/article/508">
   <title>Better eCommerce Questionnaire</title>
   <description>Overhaul your ecommerce strategy now! Face up to the
   tough questions with Lee, as he guides you through a simple process
   to optimize your ecommerce strategy.</description>
   <link>http://www.eCommerceBase.com/article/508</link>
 </item>
       
</rdf:RDF>

As you can see, the file begins with a <channel> tag that contains the title, description, and URL of the site that the RSS file describes as well as a list of the <items> that the channel currently contains. This tag is then followed by an <item> tag for each of the articles that appear of the front page of SitePoint.com. For each, the title, description, and URL are provided. It should be noted that this is a bare-bones RSS file -- many sites make use of standard extensions to the RSS format to include things like author names, images, and publication dates for the items in their channel, but for the purposes of this article this basic RSS file will do.

Now, since most Web browsers can't read XML pages and the browsers that can only display the code of the page (Internet Explorer 5+) or the textual portions of the page (Netscape 6+) by default, you need some intermediate technology to convert this RSS document into something presentable if you want to display it to users. Other possibilities include reading the file and storing the headlines into a database, or emailing subscribed users if particular keywords appear in the descriptions of new articles. In any case, you're going to need something that can read XML. Of the many options available in this arena, this article will examine the use of PHP to parse an XML document.

If you liked this article, share the love:
Print-Friendly Version Suggest an Article