Retrying CAPTCHAs until you succeed¶
If you're trying to automatically break a CAPTCHA, you aren't always going to get it right on the first try! Let's use my CAPTCHA breaker test site to see how that plays out in practice.
If you haven't gone through the CAPTCHA breaking tutorial yet, I recommend you start there!
Prepare our pieces¶
Get to the CAPTCHA¶
import time
import asyncio
from playwright.async_api import async_playwright
playwright = await async_playwright().start()
browser = await playwright.chromium.launch(headless = False)
page = await browser.new_page()
We'll visit the page. I'm going to click the "Generate" button just so we can pretend we did some sort of setup (entering text, etc).
await page.goto('https://jsoma.github.io/captcha-breaker-tester')
await page.locator("[value='Generate']").click()
# Wait for the CAPTCHA to generate
time.sleep(1)
Condense the breaking code¶
If our CAPTCHA breaking code is split across multiple cells, we need to move our CAPTCHA breaking code into a single cell.
from deskew import determine_skew
from wand.image import Image
import numpy as np
import pytesseract
# Save the CAPTCHA
await page.locator('#captcha-holder > img').screenshot(path='captcha.png')
guess = pytesseract.image_to_string('captcha.png').strip()
print("The guess is", guess)
The guess is 65kRtH
Add the testing code¶
You need some way to check whether the CAPTCHA worked or not. In this case, we're looking for the #result
div says "Correct!".
await page.locator("#answer").fill(guess)
await page.locator("#test-answer").click()
# There are better ways to do this, but
# os.sleep is the most flexible!
time.sleep(1)
# Something to test if the CAPTCHA was correct or not
correct = await page.locator("#result").inner_text() == "Correct!"
if correct:
print("It was right!")
else:
print("It was wrong!")
It was right!
Combine the code¶
This is a rough breakdown of how you might implement the CAPTCHA breaker. The important part is a test for success that allows you to break out of the loop.
We use an attempts
variable instead of while True
because if you don't get it done in 30 tries you probably need to adjust your CAPTCHA-breaking code!
# Visit the page and do any setup you need
await page.goto('https://jsoma.github.io/captcha-breaker-tester')
attempts = 30
# Keep trying to break the CAPTCHA until attempts run out
while attempts > 0:
# Convert the CAPTCHA to text
await page.locator('#captcha-holder > img').screenshot(path='captcha.png')
guess = pytesseract.image_to_string('captcha.png').strip()
print("Guessing", guess)
# Input the answer
await page.locator("#answer").fill(guess)
# Test for success
await page.locator("#test-answer").click()
time.sleep(1)
correct = await page.locator("#result").inner_text() == "Correct!"
# If correct, exit
# If not, try again
if correct:
print("Successful!")
break
else:
# Can't figure out this captcha
# generate a new one by clicking the button
print(f"Not successful, {attempts} left")
attempts = attempts - 1
await page.locator("[value='Generate']").click()
time.sleep(1)
# We're out of the loop, but maybe we failed to break it
if attempts == 0:
raise Exception("Failed to break the CAPTCHA")
print("We finished successfully")
Guessing 2TalmE Not successful, 30 left Guessing 29399 Not successful, 29 left Guessing IFg3bR Not successful, 28 left Guessing z6tAnX Successful! We finished successfully