Trying out reCAPTCHA

· by Steve · Read in about 2 min · (425 Words)

In the last week or so, the amount of comment spam being fired at this blog has been getting out of hand - it’s now averaging at about 100 every hour (or one every 36 seconds). Askimet does a pretty good job and automatically junks all but about 2% of these, but the remainder end up in my manual moderation queue and are starting to get really annoying (25 in the last 10 hours).

So, in an attempt to cut it out entirely, I’ve added reCAPTCHA to the comments page. You know the drill with CAPTCHAs by now, this involves typing in a word (or two in this case) from a somewhat distorted source on the basis that machines find pattern recognition hard, but humans don’t. The extra twist in this case is that the words you type are actually used for something - all the images that come from reCAPTCHA are actually from real scanned books, where the words couldn’t be identified automatically. At the moment, all the books are coming from the Internet Archive.

Wait, so how does it validate that you typed the right word? Well, this is why you get two words in the reCAPTCHA box - only one of them is actually unknown, the other is known because it’s been identified already by other people. When you provide an answer for the one that’s not already known, you’re actually being a human OCR machine for the internet archive 😀Of course, it’s random and you don’t know which one is the real test and which is the one you’ll be contributing an answer back for. The system also gives the same image to multiple people so that a slip-up from one person doesn’t mean the image is tagged with an incorrect identification for future validation. I think this is a rather nice idea - if I have to put something in to stop spammers and add a little bit of irritation to commentators, it might as well be doing something useful at the same time.

Hopefully this won’t be too annoying to people, let me know what you think of it. By nature of the technique, you’ll occasionally get something odd - If you find it hard to recognise, just hit the little refresh icon in the reCAPTCHA box. If you get the answer wrong, it won’t lose your comment so don’t worry, you’ll just get rechallenged.

Edit: Oh yes, and if the reCAPTCHA block looks misaligned, just hit your browser refresh to update your cached copy of the site’s CSS.