We are still actively working on the spam issue.

ReCAPTCHA

From InstallGentoo Wiki
Jump to: navigation, search
A properly filled-out ReCAPTCHA.

reCAPTCHA is a service run by Google that was initially created to both help digitise books, and prevent bots from spamming. Since the creation of reCAPTCHA2, it now is a service designed to train AIs to recognize objects as what they are (and as such aide the literal Skynet-like doom of humanity). It is used on 4chan to prevent bots from spamming posts or reports.

reCAPTCHA is not a silver bullet. Any sufficiently dedicated spammer can just hire some people from a poor country to fill out CAPTCHA problems for a few cents each. That being said, it's reasonably effective on most websites.

How it works

reCAPTCHA v1

Diagram of how reCAPTCHA works
  1. The user loads the web page with the reCAPTCHA challenge JavaScript embedded.
  2. The user's browser requests a challenge (an image with distorted text) from reCAPTCHA. reCAPTCHA gives the user a challenge and a token that identifies the challenge.
  3. The user fills out the web page form, and submits the result to your application server, along with the challenge token.
  4. reCAPTCHA checks the user's answer, and gives you back a response.
  5. If true, generally you will allow the user access to some service or information. E.g. allow them to comment on a forum, register for a wiki, or get access to an email address. If false, you can allow the user to try again.

reCAPTCHA v2

  1. The user loads the web page wit hthe reCAPTCHA challenge as an embedded iframe requiring Javascript
  2. The user's browser requests a challenge from reCAPTCHA. reCAPTCHA then displays the challenge as seeking a match of a type of object from randomly gathered pictures. Usually, this is three or more on the first time. If no match exists, the user is asked to select "skip". Also, if the user fails, the second type of test comes, where a single image is presented and the user is required to match the specific quadrents of the image with the desired object. After success, reCAPTCHA gives the website an authentication token via Javascript.
  3. After this, generally you will allow the user access to some service or information. E.g. allow them to comment on a forum, register for a wiki, or get access to an email address. If false, you can allow the user to try again.

Controversy

As part of their new spam detection algorithms, Google will serve considerably more difficult CAPTCHAs to users who aren't logged in to a Google account. These harder CAPTCHAs offer zero tolerance on typing mistakes, forcing you to type both test words correctly, much to the bane of most 4chan users, who tend to enter gibberish for the OCR word.

This happens when a certain API key requests too many CAPTCHAs in a certain time frame that go unsolved, as is the case with 4chan, since every pageload requests a new CAPTCHA from Google's servers. Moot has since fixed this behavior to only request the CAPTCHA when you type in to the comment box, but he was quickly crucified for it, and people quickly pushed dirty Javascript hacks to change it back.

New API

In December 2014, Google introduced a revised version of reCAPTCHA that goes even further in serving captchas of different difficulty to different users, which was quickly adopted by moot. Users are invited to tick a checkbox, which can be done either by a mouse click or by tabbing to the checkbox and pressing the spacebar. Depending on the user's IP address, HTTP headers (including cookies, user-agent, and referer), mouse movements/keystrokes, and the outcome of various obfuscated Javascript tests, Google can either approve the user immediately or ask them to solve one of various types of captcha:

  • A single-"word" (typically not an actual English word) captcha with minimal distortion.
  • A house number.
  • An image recognition test where the user is asked to pick images like the sample image.
  • Two words, only one of which must be solved correctly, similar to classic reCAPTCHA.
  • Two highly distorted words with added "ink blots" and many easily confusable m's, n's, and r's, both of which must be typed correctly.
  • Starting in February 2015, two highly distorted words with the letters drawn outlined, both of which must be typed correctly.
  • A form instructing the user to solve one of the above captchas, then copy a code from one text box to another before submitting their post. It is similar to the fallback version of the captcha that appears when Javascript is disabled, but appears sometimes even when Javascript is enabled. It has been reported to appear frequently for users of the "Disconnect Me" extension in Chrome unless they whitelist Google.

When the new API was first introduced, some users were able to reduce the difficulty of the captchas Google serves them by setting their User-Agent header to that of an Android browser, and by forging rather than blocking the Referer header. Cookies passed to the captcha as a result of being logged in to Google services also affected its behavior, although not always for the better.

In February 2015, Recaptcha was updated, and setting your User-Agent header to referer no longer has any effect. Currently it appears Recaptcha is requiring a referer and a login cookie to get an easy captcha. However, both can be forged, and

Referer: https://www.google.com/recaptcha/api/fallback?k=6Ldp2bsSAAAAAAJ5uyx_lx34lJeEpTLVkP5k04qc
Cookie: NID=67

is enough for some users to get a classic-style reCAPTCHA.

In Firefox, the Header Tool add-on is very useful for tweaking these HTTP headers on a per-site/per-page basis. To use it, install the add-on in Firefox, open its settings via Tools > Header Tool > Header Tool, and enter in the sidebar a regexp to select the applicable sites (preceded by an @), such as

@^https?://www\.(google|gstatic)\.com/recaptcha/

followed by the headers you want to send to that site.

One /g/ anon worked on reverse engineering the obfuscated Javascript reCAPTCHA serves to users, and posted the findings on Github. The anon took down the Github repository and stopped publishing reversing work upon a request from Google, but many copies remain available via forks of the repository, such as [1].

It is possible to control the data you send to Google by disabling all Javascript from Google and using the Javascript-free fallback version of reCAPTCHA. When done manually, this requires the user to copy a code from the reCAPTCHA iframe to the posting form. The 4chan X userscript (ccd0 and Appchan forks) can automate the code-copying part of the process. Using the Javascript-free interface may result in harder captchas, although this can often be alleviated by header tweaking as described above.