Breaking captchas with Selenium or Playwright¶

Visiting the page¶

I made a page to generate CAPTCHAS that you can try out if you want some to play around with.

Site with example captcha

Visit page with SeleniumVisit page with Playwright

from selenium import webdriver
from selenium.webdriver.common.by import By
from webdriver_manager.chrome import ChromeDriverManager

driver = webdriver.Chrome(ChromeDriverManager().install())
driver.get("https://jsoma.github.io/captcha-breaker-tester/")

import asyncio
from playwright.async_api import async_playwright

playwright = await async_playwright().start()
browser = await playwright.chromium.launch(headless = False)
page = await browser.new_page()
await page.goto('https://jsoma.github.io/captcha-breaker-tester')

You probably won't be doing this with BeautifulSoup since CAPTCHAs are usually JavaScript-based.

Saving the CAPTCHA image¶

Both Selenium and Playwright will grab an element off of the page and save a screenshot of it for you. In theory you can also stream it as a bytes object instead of saving it, but pytesseract doesn't seem to like that without a lot of fiddling around.

Saving CAPTCHA image with SeleniumSaving CAPTCHA image with Playwright

image = driver.find_element(By.CSS_SELECTOR, "#captcha-holder > img")
image.screenshot('captcha.png')

await page.locator('#captcha-holder > img').screenshot({ path: 'captcha.png' });

One thing to note that in the next step we remove 2 pixels from the top/left/bottom/right from the image. This is because there's a thin border that bleeds through into the screenshot and makes it harder for the CAPTCHA to be read.

Your downloaded image might look like this:

Before cleaning

Deskew and clean¶

We'll use the ImageMagick library Wand along with the deskew library to convert the image into something a little easier to use text recognition on.

I like to use Wand because it's... it's a pain, but it's a little nicer than the other libraries that have documentation for the deskew library. If you can't get it to work on your machine, though, you can read the deskew documentation.

# macOS only
brew install imagemagick

pip install wand
pip install deskew

from deskew import determine_skew
from wand.image import Image
import numpy as np

with Image(filename='captcha.png') as image:
    with image.clone() as cleaned:
        # Pull a couple pixels off the edge to remove border noise
        cleaned.crop(2, 2, image.width - 2, image.height - 2)

        # Remove anything that isn't the text
        cleaned.trim()

        # Remove rotation
        angle = determine_skew(np.array(cleaned))
        print("Rotating", angle, "degrees")
        cleaned.rotate(-angle, 'white', True)

        # Save
        cleaned.save(filename='captcha-cleaned.png')

Before cleaning

Before cleaning

After cleaning

After cleaning

Breaking the captcha¶

Pytesseract¶

Now we'll use pytesseract to break the captcha. It's the best balance of accuracy and ease of use that I've found.

First, we'll install tesseract and pytesseract.

# macOS only
brew install tesseract
pip install pytesseract

Then we'll use them.

import pytesseract

guess = pytesseract.image_to_string('captcha-cleaned.png').strip()
print("The guess is", guess)

Keras OCR¶

Keras OCR is a lot fancier, but works a lot better for edge cases.

First, we'll do the installation.

# macOS (or at least M1 macs)
pip install tensorflow-macos keras-ocr

# Windows
pip install tensorflow keras-ocr

Now we'll use them.

import keras_ocr

pipeline = keras_ocr.pipeline.Pipeline()
prediction_groups = pipeline.recognize(['captcha-cleaned.png'])

guess = prediction_groups[0][0][0]
print("The guess is", guess)