Article
Accessible Flash Parts 1 And 2
A step towards accessible Flash content: providing real-time Flash content for search engines, and bookmarking Flash Websites.
Part 1 - SEO for Flash, starts below.
Part 2 - Bookmarking Flash, starts here.
The State of Play
Today's search engines have grown up in partnership with the traditional html document; not surprisingly, they are fantastic at indexing every last piece of information found within html files.
Great! But, what happens when your new Website has no html content at all? This occurs most often with Websites built in Macromedia Flash, where the content is locked up in a file that search engines completely ignore.
This question led me to think about what could be described as the holy grail of Flash Web development: achieving good search engine listings, based on current content, for Flash Websites that have no 'surrounding', relevant html content -- in other words, what's come to be known as the Flash site.
In this case study I'll show you how to accomplish this task in a way that's simple, scalable and transparent to your general Website visitor -- no annoying redirects, refreshes or 'hidden' text to consider. It has the added benefit of allowing sections, including frames and scenes, of a Flash Website to be bookmarked for reference at a later date.
I'll also illustrate alternate methods for providing Flash content to search engines.
This process is built using disparate concepts that, on the face of it, have nothing to do with Flash. Joined together, though, they provide the key to success.
Utilise this method, and you'll have a Flash Website with content that is directly accessible via the URL or external hyperlinks. The knowledge that search engines can index Flash Websites effectively will allow you to promote Flash as a viable technology in which to develop and promote your clients' Websites.
First Thoughts
The idea for developing a scalable and easy-to-maintain system to allow search engines to index Flash sites hit me when I was in the initial phase of building my own Website.
In 2000, I evaluated the options and decided to build the Website using Flash, knowing the general consensus among the online world (some gurus in there as well) was that this approach meat I'd be tossing site accessibility out the window. Among the stated crimes of an all-Flash site included assumptions such as ignoring both the browser's back and forward buttons*, and an inability to make the site available for search engine indexing (under the general umbrella of direct linking).
Being the questioning person that I am, I set as one of the objectives for my Website that it competently address the issue of search engine indexing. My site had to be fully indexed by search engines, enabling prospective clients to find me amid my contemporaries in the search result listings.
*Robert Penner, employing some lateral thinking, proves this assumption wrong using a frameset (a solution which may not be to everyone's liking).
As I progressed, I realised that a number of concepts were involved in my achievement of this goal.
Concept 1: External Content is The Key
After reading many, many Flash resources, I saw that a key to accomplishing Flash site SEO was to separate the Website content from the Flash movie - to load it in from external sources. These sources could constitute anything from a database table or XML file, to a simple plain text file.
The benefit I saw in storing my text content outside the SWF file was that it could be used in a multitude of external devices. For instance, the text in a database table can be used in XHTML, Flash and, in looking to the future, an XML-ready device.
My choice for storing and retrieving content, which was made as a result of previous experience with the technology, was a MySQL database with PHP as the server side script. You could, however, conceivably use any database/script combination to accomplish this.
Now, storing content is one thing, reliably retrieving and serving the content for it to be indexed by the search engines and viewed in a Flash movie by humans was another. Time for some search engine research...
Concept 2: Understanding Indexing
This is by no means an exhaustive look at how search engines get their content -- that would take up more than a few dead trees! However, we still need to have a quick look at how Web content ends up in a search engine.
The well-known search engines get their content via bots that travel around the Web following links and sending the information they find back to their respective engines. Each bot is identified by a unique name, called a user agent string (string is code talk for a piece of text, and has nothing to do with a programmer's shoe laces). In fact, nearly every visitor, bot, Web browser or otherwise to a Website provides their own user agent string.
Let's look at a few, to see what we're dealing with:
- Googlebot/2.1 (+http://www.googlebot.com/bot.html)
- Mozilla/3.0 (Slurp/si; slurp@inktomi.com; http://www.inktomi.com/slurp.html)
- FAST-WebCrawler/3.3 (crawler@fast.no; http://fast.no/support.php?c=faqs/crawler
- Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.2.1) Gecko/20021130
The first three user agent strings above identify searchbots from Google, Inktomi and FAST. The last is a standard Mozilla 1.x identifier.
The very fact that reputable search engines give consistent user agent strings that differ from our human visitors allowed me to divide traffic to the Website into two main categories, and one minor category:
- Searchbots
- Humans using Web browsers
- Unknown (those visitors too difficult to identify by the supplied user agent)
Searchbots, at their core, are stripped down versions of browsers. They ignore client side scripting such as JavaScript, and can ignore tricks like Meta tag refreshing and making text the same colour as the document background.
But what of server-side scripts? Searchbots, like their Web browser cousins, accept the page the server sends. Could I then get my Web server to send different pages based on the results of interrogation of the User Agent string? You bet!
James owns Web firm