Why does google use CAPTCHA?

Recently I noticed that Google will be supporting audio as well as visual CAPTCHA tests. This is so that those who can’t see the screen can listen to a sound clip and fill in a code to prove they are human.

Matt May provided a great article on why you shouldn’t use CAPTCHA in 2004. I won’t repeat that, but the most relevant issues for me are that:

  • It places undue burden on users. All users, but especially those with accessibility issues.
  • It is not effective. People have created systems to break it in 92% of cases.
  • There are better methods.

What actually prompted me to post wasn’t the audio aspect, but the fact that it’s Google. Surely Google is in an ideal position to accomplish something more effecive?

On this blog, Akismet has blocked over 10,000 spam comments by submitting them to a central service that checks the comment against a database of spam comments. If it matches, it’s caught, if it doesn’t match it might be published depending on my other criteria. Only two spam comments have gotten past Akismet, and both were caught by the secondary measures.

That kind of centralised checking mechanism is surely Google’s bread & butter of application type?

Another anti-bot tool is Bad Behaviour, which blocks access to the site depending on the properties and behaviour of the access attempt. (I don’t know all the details, but IP addresses, user agents and other aspects are all used.) Google makes use of databases of IPs already (at the very least for identifying country of origin), surely this would be a useful extension of that?

Another option would be a social networking service, which uses a web of recommended people. Gez Lemon concluded that:

There would be a lot of work required to make this foolproof, and it would also take time to establish a trustworthy community, but I think using a web service based upon social networking is a far more reasonable approach than testing for a person’s ability… which will always cause insurmountable problems to some users; that cannot be ignored.

Perhaps that’s more a Yahoo style service though? (Google tends to use computers and algorithms, Yahoo tends to use more socially oriented services.)

Google has just the kind of resources and know how to take an approach other than CAPTCHA. I’m somewhat disappointed, although not surprised.


Technorati Tags:

27 contributions to “Why does google use CAPTCHA?

  1. Akismet, Bad Behaviour and some other stuffs like this (WordPress spam karma as an example) may be good to block 10.000 spams on a single blog.

    Now, think about the resources needed to have them run on a Googlish scale ? scary isn’t it ?

    That may be one of the reasons for capchas even though they suck for many reasons (accessibility matter coming first).

  2. I’m not sure about audio Captcha, but I have just closed my phpBB forum as their visual captcha algorithm has been broken by spammers. Akismet on another hand has not failed me yet.

  3. Javier wrote “I don´t trust on akismet (7% of all comments aren´t spam).”
    That is true. On my blog it’s way past 90%. I have a total of 102 valid comments on my blog since march 2004 (I know, not popular, but who cares). In the past 10 days Akismet (and wp-hashcash) has blocked over 500 spam comments.
    I used SpamKarma before. Now I think Akismet and WP-HashCash is way better.

  4. There’s a good review of (wordpress compatible) anti-spam pluggins on the wordpress podcast:

    Mark Jaquith reviews and compares the three leading anti-comment spam plug-ins, Akismet, Bad Behavior 2 and Spam Karma 2.

    Still, this kind of moving away from the point…

  5. One word on Bad Behavior. I’ve used it on some of my wordpress installs (I use WP as a CMS) and I’ve had it block me from home. What is my crime? From BB’s website:
    “In most cases, this is caused by over-aggressive personal firewall/browser privacy software.”
    Um… I run Windows XP default firewall. I refuse to block out users that simply run windows default firewall. To me that is too constrictive. I’m only running akismet on most of my sites, atm. I might give a look at hashcash again.

  6. I’ve been using WordPress from early 2004 until last september before moving to something more… rubyish actually.

    I’ve always use Spam Karma (0.x, 1 and 2), first because the developper is a friend and I’ve been used as a guinea pig numerous times, and then because it was the best : only a handful of false positives in about 2 years and half, and maybe half a dozen comments going through.

  7. Personally, I like captcha’s. My site is more of an online “journal” than a blog with very limited reader subscription. I feel that if a user is going to take the time to write a post, they can take a bit more time and type in the captcha. I don’t like the idea of someone else deciding what should and shouldn’t go on my site.I just have to remember to make the captcha readable to the human element.

  8. Hi Rescue9,

    You are assuming that everyone can get past you’re CAPTCHA. You cannot assume that people can. My pass rate is about 50%, and I can see the blooming things.

    Personally, I have no interest in dictating what people have on their personal sites. However, for organisations that provide services or products, CAPTCHA is an accessibility blooper.

  9. Olá a todos.

    Doesn’t google have an usability staff? I guess they should have done some kind of study over the captcha they use.

    Abraços!
    Viola

  10. That is true. On my blog it’s way past 90%. I have a total of 102 valid comments on my blog since march 2004 (I know, not popular, but who cares). In the past 10 days Akismet (and wp-hashcash) has blocked over 500 spam comments.
    I used SpamKarma before. Now I think Akismet and WP-HashCash is way better.

  11. Whatever may be the other aspects, the use of a CAPTCHA system is not a user friendly measure. There are many who dont like to fill CAPTCHE.

    Then comes the security issue. As you pointed out, there re systems that can break 92% of CAPTCHA…Strange that all major websites use CAPTCHA..

    Currenty I am trying to find out the weakness of CAPTCHA used in some popular websites.

  12. > It is not effective. People have created systems to break it in 92% of cases.

    LIE!!!! It only breaks the Gimpy captcha. At least read the damn article that you link to.

    > There are better methods.

    WRONG!!!! Your “better methods” lead to false positives, which is the only thing that’s worse than spam.

  13. Captcha is a tool to distinguish between human and automated programs. Unfortunately most of the captcha are a burden to users. An ideal captcha should be easily readable to human and rarely understandable to hacker applications. I have created such a captcha and published in my blog. I have a plan to establish it as a service. I.e. you can include it as a plug-in to your web site. Please give your valuable comments regarding this.
    http://www.codegeeks.net/captcha-using-aspnet

  14. Hi Aby,

    You’ve pretty much missed the point of the article really. There isn’t such a thing as a good CAPTCHA,and looking at yours hasn’t changed my mind.

    Apart from just having one modality, the images aren’t that hard to read, meaning they will be easliy crackable.

    If I were to use a CAPTCHA service, it would be recaptcha.net, as it offers an audio equivalent and works for a good cause.

  15. Instead of captcha, what about something like the following, which is quick and easy, and hard for a bot to get past…

    To Continue Answer the following:

    Bob married Sally. Sally’s mom is Sarah. Sarah is Bob’s ______.

    A. brother
    B. father in law
    C. step son
    D. mother in law
    E. grandfather
    F. dog
    G. lover

    Obviously the correct answer is either D or G, but how would the bot figure that out?

  16. I dont think that using Akismet like system on Google Search back end would be a better option.

    Let me explain,
    Just have a look around, and evey at your searching pattern and what you search, most of the searches on google appear to be spam. For example: a nob may search ‘Hello World’, or his website name, or its own name etc.

    Similar is the case when a person tries to search a phrase that seem just like a Spam phrase but in actual its not span or BOT. He just need it, and if Google block him, Google will loose 1 user.

  17. Hi Hamid,

    I don’t think you quite understood, I wasn’t saying anything about their search.

    Google actually does block robot style behaviour on their search, but with an error page based on your IP address rather than a CAPTCHA.

  18. With Experience with Blocking spam, using these other methods that you mention are too restrictive and flag False Positives (Positives being Spam). Web presences like Microsoft, Google, Yahoo, Alta-Vista cannot risk blocking out a SINGLE valid user in the name of blocking out ALL spam.

    In Programming Languages/Type Inference, The solution needs to be Complete but need not be Sound. It has been proven that you CANNOT (At least Very Difficult) have a Sound solution, that is ALSO complete. For Large Scale problems like this.

Comments are closed.