Link of the week - reCAPTCHA: stop spam, read books, beat the bots
reCAPTCHA is free program that protects you from spam while furthering the fine goal of digitizing physical libraries, one word at a time.
CAPTCHAs (for Completely Automated Public Turing tests to tell Computers and Humans Apart) are used on many Web sites to distinguish between legitimate human users and automated "bots" that trawl the web to generate spam.
Using the ability of human users to decipher distorted text, CAPTCHAs prevent bots from navigating to protected Web sites.
Turning anti-spam in to a force for good
About 60 million CAPTCHAs are solved by humans around the world every day. In each case, roughly ten seconds of human time are being spent, which means more than 150,000 hours are spent every day on this task.
reCAPTCHA harnesses this power by channeling the effort spent solving CAPTCHAs into effort spent "reading" books.
To the rescue of Optical Character Recognition
Many pre-computer age books are being digitized to archive human knowledge and improve accessibility. Pages from such books are photographically scanned and these images are transformed into text using Optical Character Recognition or OCR.
The problem is that OCR is not perfect: when it bumps into a difficult word, reCAPTCHA is called tp the rescue.
reCAPTCHA transforms words that cannot be read by OCR in to CAPTCHAs for humans to decipher on the Web.
Each new word that cannot be read correctly by OCR is given to a human user along with another word for which the answer is already known.
The user is then asked to read and enter both words.
If they correctly enter the word that is already known, the system assumes their answer for the new word is also correct. The system then gives the new image to a number of other people to determine, with higher confidence, whether the original answer was correct.