Article

It's a Hit! Gauging Success through Traffic Analysis

Page: 1 2 3 4 Next

IP Addresses and User Agents

The second thing you need to check is the IP addresses and user agents of your visitors. This information will tell you two things:

  1. When a search engine spiders your site.

  2. If someone is abusing your site.

The first point is important because, unless you know when your site was spidered, you cannot effectively troubleshoot your search engine listings (for instance, if they appear outdated, or fail to appear at all). Many people will remember when they submitted to the search engines, but if you ask them when they were spidered, they don't have a clue. Knowing when a search engine spiders, and when they update, will allow you to predict when your listings will change.

The second point is important because there are a lot of people out there with little to do, and there are many ways they can abuse a Website. One way is to write a script that rips content off a Website to display on your own.

For instance, there are scripts that rip news headlines off sites like CNN.com. Then the site owner displays the headlines on their own site, along with a link back to CNN. While technically it is wrong to copy their headlines, it is easily forgiven by bigger players, as the site owners are using the headlines to link to them (effectively driving traffic back to their site).

However, it is just as easy to write a script that steals articles from a site and displays them on your own. If you are the victim of either of these malpractices, you can usually tell through your logs. There will usually be a large number of requests from their IP address (which should resolve to a Web server), as well as excessive hit counts from a user agent called "PHP," "Perl," or another scripting language. Sometimes people will download your entire site and then republish it on their server, however they sometimes forget to recode some links, resulting in hits from their version of your site to your original site. One SitePoint Forum advisor recently discovered this exact thing happening by close monitoring of his referrers.

On the topic of downloading an entire site, there are also site rippers out there. Often benignly named "offline browsers," much in the way some Trojans are named "remote administration tools," these are programs that can be used to download your entire site, which not only steals your site (design, content, etc.) but can crash, or severely slow down your server. Depending on the size of your site, these programs can be detected by looking at IP addresses -- if you see hundreds or thousands of impressions from one address, chances are it's one of these programs. You can also look for their user agents -- some of the more popular ones are Wget, Teleport, HTTrack, and Web Reaper. I should mention that Wget is a valid program used on unix servers to download files, such as patches or drivers. However, unless you provide such downloads on your site, anyone using this agent on your site is probably stealing.

Yet another form of site abuse is to harvest emails off of a site -- this is especially important if you run a community site, where users often post their email addresses. AS with site rippers, you can often identify email harvesters via their user agent.

The final method of site abuse is to block a site's advertisements. Some consider this a right of the surfer, however, I feel that it is stealing. A Webmaster places advertisements expecting that users will view them in conjunction with the content they view for free. If visitors block the advertisements, then ethically I don't think they should visit the site at all. Some Webmasters will redirect people using ad blocking programs to a page that asks them to pay for site access, and that approach reflects how many Webmasters feel: you either pay with your wallet, or with your eyeballs. Like the aforementioned examples, this can be detected by monitoring user agent.

Once you identify the IP addresses or user agents of those abusing your site you can ban them (using .htaccess if you run Apache), but a full explanation of this is obviously beyond the scope of this article.

Other Statistical Information

There is much information you can gather from your statistics in addition to that which has been mentioned so far. This information is usually useful when you attempt to sell advertising, or reassess your promotional efforts.

Demographics

Your server stats can provide limited demographic information that's helpful for both designing your site, and attracting advertisers. For instance, by researching the stats on operating systems or user agents, you can tell whether your visitors use a PC or a Mac, Internet Explorer or Netscape. Some software can also give you geographic statistics by resolving the IP address of your visitors. While these statistics are not the most accurate (it isn't always possible to accurately identify a user's country of origin), this information can still be valuable in the presentation of packages to potential advertisers, or even when you're deciding whether to make regional changes to your site -- add content in a second language, for example.

Search Engine Statistics

In addition to glancing over your referrers to ensure that you're maintaining your search engine positions, you can occasionally do a more detailed analysis, to compare the amount of traffic you get from various search engines. This can help you identify whether there's a particular engine that's performing poorly for you. You can then identify which referrers you need to work on -- to increase the amount of traffic they send you (though you should keep in mind that perceived 'lower traffic levels' could be the result of a search engine being less popular than the others you track).

If you liked this article, share the love:
Print-Friendly Version Suggest an Article

Sponsored Links