Article

Beyond CAPTCHA: No Bots Allowed!

Page: 1 2 3

Non-interactive Solutions

We've looked at a number of interactive solutions now, and seen how none of them are entirely perfect, either for protection from robot attack, or for reliably identifying humans without introducing accessibility barriers.

Perhaps the solution lies with non-interactive solutions. These analyze data as it's being submitted, rather than relying on users to authenticate themselves.

Honey Traps

The idea here is that you include a form field, which is hidden with CSS, and give it a name that encourages spam bots to fill it in, such as "email2." The human user will never fill it in because they don't know it's there, but the bot won't be able to tell the difference. Therefore, if that field contains any value when the form is submitted, the submission is rejected.

The problem is that assistive technologies may not be able to tell the difference either, and so their users may not know not to fill it in. That possibility could be reduced with descriptive text, such as "do not complete this field," but doing that may be very confusing, as well as being recognizable by a bot.

Another variant of this is a simple trap that asks human users to confirm they're not robots. This could take the form of a checkbox, like this one.

Simple confirmation test

In both these examples, however, bots could learn to recognize the trap and thereby circumvent it. It's one of those things that only works as long as not many people are using it -- as soon as it became prevalent, on high-traffic sites like Digg or Facebook, the spammers would simply adapt.

Session Keys

A partial solution for form submission is to generate a session key on the fly when building the original form, and then check that session key when the form is submitted. This will prevent bots that bypass the form and post directly to its target, but it does nothing to stop bots that go through the regular web form.

Spam Filtering and Heuristics

Systems that accept user-generated content (such as blog comments) can filter content based on specific keywords (like "Viagra"), or using Bayesian filters to recognize patterns that might indicate spam. Such systems are already used by the vast majority of email systems, and are highly effective in reducing spam.

More sophisticated systems use a combination of filtering and heuristics that identify spam by additional factors, such as how quickly a comment was posted. One popular system is Spam Karma, which produces reports like this one.

Spam Karma report

The report shows how a number of factors contribute to an overall "karma" score: posts with a low enough score are automatically rejected (and the admin is sent an email like the above).

It's a misunderstanding of the nature of Karma to think that it can apply to individuals. Philosophical meanderings aside, this is a highly effective system that can make a huge difference to the spam overhead a site admin has to deal with.

There's also a third-party service called Akismet, which works on the same principle of content filtering using keywords and heuristics. Since the system is managed centrally it has a much larger base of data to work from, which should make its assessments far more reliable -- with a lower chance of spam getting through or of making a "false positive" (identifying as spam something which is legitimate).

Limited-use Accounts

One way for a system such as free email to limit abuse by robots is to deliberately throttle new accounts for a period of time; for example, by only allowing ten emails to be sent per day for the first month.

However, this approach may not ultimately help. It may reduce the incidence of abuse on a per-account basis, but it doesn't prevent abuse entirely. There's also nothing to stop a spammer from simply signing up for thousands of accounts and sending ten spam emails from each one. And of course, such a limitation may affect legitimate users as well, but legitimate users aren't going to be inclined to sign up for multiple accounts.

Conclusion

The conclusion? Don't make users take responsibility for our problems.

Bots, and the damage they cause, are not the fault or responsibility of individual users, and it's totally unfair to expect them to take the responsibility. They're not the fault of site owners either, but like it or not they are our responsibility -- it's we who suffer from them, we who benefit from their eradication, and therefore we who should shoulder the burden. And using interactive authentication systems such as CAPTCHA effectively passes the buck from us to our users.

Moreover, the common theme with all interactive alternatives is that they fail users who have a cognitive disability, or don't understand the same cultural cues as the author, or use assistive technologies. The more stringent the system, the higher the bar is raised and therefore the greater the chance of failing to recognize or admit a real human.

In my view, the right way to address this problem is with non-interactive solutions that ordinary users don't even need to be aware of. Systems such as Spam Karma and Akismet are highly effective at reducing the amount of spam that site administrators have to deal with. In fact, we use Spam Karma here at SitePoint, and it does make a significant difference.

The Future

It's clear that both interactive and non-interactive tests will continue to be used by site owners for the foreseeable future. Developers will try to come up with new and better tests, and spammers will continue to find ways of cracking them; it's very much a vicious circle.

Perhaps, at some point in the future, somebody will come up with a test that is truly reliable and uncrackable -- something that identifies humans in a way that cannot be faked. Maybe biometric data such as fingerprints or retina scans could factor into that somewhere; perhaps we'll have direct neural interfaces that identify the presence of brain activity.

Personally, I'm still hoping for telepathic XML!

If you liked this article, share the love:
Print-Friendly Version Suggest an Article

Sponsored Links

Rate This Article

  • 1
    Poor
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
    Great

Comment on This Article

Have something to say?

Post A Comment

You need to be a member of the SitePoint Forums to comment on this post. Sign Up

Already a member? Post using your SitePoint Forums account: