Article
Beyond CAPTCHA: No Bots Allowed!
I'm sure you've seen them many times -- those wild squiggles that need to be deciphered and typed into a text box before you can buy concert tickets online or access a comment form.
CAPTCHAs are generally one or two words presented as graphics, overlaid with some kind of distortion, and they function as a test that relies on your human ability to recognize them. CAPTCHA stands for "Completely Automated Public Turing test to tell Computers and Humans Apart." This is a misnomer, because a CAPTCHA isn't a Turing test -- but we'll come back to that later!
The CAPTCHA innovation was pioneered by developers at Carnegie Mellon University. The idea behind it was to develop a means of distinguishing between people and web robots, so that web sites could offer their resources to individual humans without being exploited by robots.

The Need for CAPTCHA (or Something)
Site owners face a number of unique challenges in protecting their resources from automated harvesting. These include:
- Resources may be expensive to provide, and machines can consume far more data far more quickly than humans. Therefore, services that are machine-accessible may prove prohibitively expensive to maintain.
- Allowing bots to post comments and user-generated content opens a floodgate for spammers, which inevitably results in massive volumes of spam -- often to the point where a service becomes unuseable.
- Data may be highly sensitive, such as personal medical or financial information, and needs to be sufficiently protected to prevent against attacks from data-mining robots.
- Interactions with a system may have fundamental implications for society as a whole; consider the issues that would arise in the case of electronic voting.
The Problem with CAPTCHA
CAPTCHA systems create a significant accessibility barrier, since they require the user to be able to see and understand shapes that may be very distorted and difficult to read. A CAPTCHA is therefore difficult or impossible for people who are blind or partially sighted, or have a cognitive disability such as dyslexia, to translate into the plain text box.
And of course there can be no plain-text equivalent for such an image, because that alternative would be readable by machines and therefore undermine the original purpose.
Since users with these disabilities are unable to perform critical tasks, such as creating accounts or making purchases, the CAPTCHA system can clearly be seen to fail this group.
Such a system is also eminently crackable. A CAPTCHA can be understood by suitably sophisticated scanning and character recognition software, such as that employed by postal systems the world over to recognize handwritten zip or postal codes. Or images can be aggregated and fed to a human, who can manually process thousands of such images in a day to create a database of known images -- which can then be easily identified.
Recent high-profile cases of bots cracking the CAPTCHA system on Windows Live Hotmail and Gmail have highlighted the issue, as spammers created thousands of bogus accounts and flooded the systems with junk. Even more recently, security firm Websense Security Labs have reported that the Windows Live CAPTCHA can be cracked in as little as 60 seconds.
One CAPTCHA-cracking project, called PWNtcha ("Pretend We're Not a Turing Computer but a Human Antagonist"), reports success rates between 49% and 100% at cracking some of the most popular systems, including 99% for the system used by LiveJournal, and 88% for that employed by PayPal.
Thus, the growth and proliferation of CAPTCHA systems should be taken less as evidence of their success than as evidence of the human propensity to be comforted by things that provide a false sense of security.
It's ironic that CAPTCHA can be defeated by those who are sufficiently motivated, when they're the very same people the test is designed to protect against. Just like DRM, CAPTCHA systems ultimately fail to protect against the original threat, while simultaneously inconveniencing ordinary users.
Heck, I find them difficult to read, and I have perfect 20/20 vision. When I signed up for my Facebook account I had to try five different images and two different browsers until I got my application through. Tedious in the extreme.
Just try this example.

What does that first word say? "One"? "Dime"? I have no idea. To take it even further, have a look at this illustration from Matt May's presentation Escape from CAPTCHA, and see how well you'd be able to deal with it!

The problem could be seen more philosophically, as a question of "how does a machine recognize a human." So in that sense, a CAPTCHA is more like a reverse Turing test, since a Turing test is about computers fooling humans, rather than humans fooling computers. But a true Turing test is about machine intelligence, whereas all a CAPTCHA tests is perceptual comprehension (which a real human can fail as easily as a machine can pass); really, a CAPTCHA isn't a Turing test at all.
James (aka