Article
Accessible Flash Parts 1 And 2
Concept 3: Targeting Content From The URL
The starting point for targeting content in an HTML Website is the URL, and the same concept applies to Flash Websites. When a page is requested via the URL, the filename is sent to the server, along with any variables, which are appended to the end of the filename.
We've all seen something like this:
google.com/search/?q=flash+and+search+engines
In my case it was:
/index.php?go=4
That part of the filename after the ? character is known as a query string* and will be available to the script index.php when it runs. In addition to this, any variables that are sent to a PHP script like this will be available in an array (think a cabinet where every draw holds a piece of information) called:
http://www.php.net/manual/en/reserved.variables.php#reserved.variables.get $_GET.
In this case we'll have $_GET['go']. (In the Google case above, we'd have $_GET['q'])
*An alternative way to accomplish this involves rewriting the URL. For example, instead of this index.php?go=4 we might have /go/4/. This is seen to be friendlier to search engines, but has a steeper learning curve when compared to a simple query string. And yes, search engines like Google do index pages with query strings, as long as they are kept short (I've found through experience that I can get listings with two variables in the query string). I'll leave the choice to you, but to keep it simple, I'll stick with the query string for now.
My first task here was to validate the contents of $_GET['go']. This is necessary because the URL can be changed by the Website visitor, and it's a central part of good Web scripting to not trust anything that a script receives via the GET or POST methods.
So, let's start by interrogating the variable that's fallen into our script. I've fully commented the PHP script that does this (lines with // or surrounded by /** **/), so if you're codaphobic, read these!
Let's also assume that the database connection was successful.
<?
/** - getparse.inc.php
---------- what we want to achieve here -----------
- check if the 'go' variable exists
- strip out any content that may be hazardous to the script
- convert 'go' to an integer for the sql query
- nb : all content kept in the database is identified by a unique integer.
- nb : error content is identified in the database by the id '-1'
**/
//first a function to parse the 'go' variable
//functions are not run by the script until called
function parsego($go_var)
{
//check if the variable is an integer
if(!is_int($go_var))
{
//the variable is not an integer, set it to error content.
$go_var=-1;
}
//return the parsed variable.
return $go_var;
}
//go is passed from the URL header.
//it is available in the super global array $_GET as element 'go'
//$go is the variable used to identify the content to be retrieved.'
//first check if the variable exists,
//if not set it to retrieve default content (1).
if(!$_GET['go'])
{
$go=1;
$title_add = "welcome";
}
//$_GET['go'] exists
else
{
//check the 'go' variable for any 'hazardous' content
//it needs to be converted to an integer using intval()
$go = parsego(intval($_GET['go']));
//check to see if the content requested exists.
$checkGo_sql="SELECT id, header FROM sitecontent WHERE id=$go";
//run the query
//the @ symbol is used to suppress error messages from PHP
$checkGo_result=@mysql_query($checkGo_sql);
//check the number of rows returned.
$checkGo_rows=@mysql_num_rows($checkGo_result);
if(!$checkGo_result || $checkGo_rows==0 || !$checkGo_rows)
{
//if there is an error or if there are no matches,
//set to retrieve error content
$go=-1;
$title_add = "content not found";
}
else
{
$go = @mysql_result($checkGo_result, 0, 'id');
$title_add = @mysql_result($checkGo_result, 0, 'header');
}
}
/** ----- results ------------------
- we have a valid value for $go that can now be
used to retrieve content from the database.
- it is targeting either the error content (-1) or existing content (>=1)
**/
?>
Concept 4: Checking for Search Bots
I now have a valid variable for sourcing content from the database table, so it's time to check who or what is visiting the Website. Again, everything is fully commented. For this example, I've introduced a tool that checks to see that the script is functioning as it should. In a live Website, you'd want to remove the if-else statement at the start of the code.
<?
/** ------ botcheck.inc.php -------
- determine if the user agent is a known search bot.
**/
//for spoofing the system so that it can be checked
//and validated from the URL.
//remove this when using live.
if($_GET['useragent'])
{
$user_agent = $_GET['useragent'];
}
//if $_GET['ua'] is not available, treat it as a real request.
else
{
$user_agent = $_SERVER['HTTP_USER_AGENT'];
//$_SERVER['HTTP_USER_AGENT']; is where
//PHP holds the user agent string.
//for PHP versions older than 4.3, use $HTTP_USER_AGENT;
}
//a list of terms found in some searchbot strings in my log files
//also some text only browsers thrown in. eg Lynx
//and some that should never be allowed near a Flash movie (web tv)
$searchbot_short_array = array("FAST-WebCrawler/", "Googlebot/",
"Googlebot-Image/", "Ask Jeeves/Teoma", "Ask Jeeves",
"Google WAP Proxy", "Slurp/", "Gigabot/", "Poodle predictor",
"AlkalineBOT/", "Scooter-", "Scooter/", "ASPSeek/", "Sqworm/",
"TurnitinBot/", "Lynx/", "Lycos_Spider", "appie", "walhello",
"WebTV", "LinkWalker", "SurveyBot/", "suzaran", "polybot",
"webcollage/", "Teleport Pro/", "search.ch", "LWP::Simple",
"EasyDL", "Minerva", "RPT-HTTPClient", "IA_Archiver",
"Spinne/", "Webster Pro", "MSProxy", "ZyBorg/",
"Indy Library", "NPBot", "Girafabot",
"Gulper Web Bot", "grub-client");
//traverse the array and look at each element
//if a bot is found, set a variable for later
foreach($searchbot_short_array as $search_for)
{
//attempt to match the array value against the user agent string.
//eregi is a case insensitive regular expression matching function
//in a live setting
//replace $user_agent with $_SERVER['HTTP_USER_AGENT'];
if(eregi($search_for, $user_agent))
{
$bot = true;
}
}
?>
If, for instance, the Altavista searchbot, Scooter, hit the Website, the $bot variable would be assigned as 'true'.