Skip to content

Selenium-Playwright scraping command conversion reference

I've been using Selenium for automated scraping of interactive websites for hundreds if not thousands of years, but Playwright seems pretty good. Let's build a quick reference to compare the two.

Is Playwright better than Selenium?

Playwright is newer than Selenium, and oftentimes has better documentation. On the other hand, it's built for JavaScript and its Python usage is a little awkward compared to "normal" Python code.

Installation

Installation is slightly more difficult for Selenium, in that you need to install Selenium, a browser, and a webdriver, which is what talks to the browser. Playwright doesn't need a separate webdriver, but it does need a browser.

pip install selenium
pip install webdriver-manager
pip install playwright

Basic imports

Selenium has approximately ten million imports. You don't necessarily need all of them, but if you're using dropdowns and waiting for the page to load and blah blah all sorts of particular things, they add up quickly.

Playwright has maybe two imports.

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.support.ui import Select
from webdriver_manager.chrome import ChromeDriverManager
# There are two versions of Playwright:
# the synchronous version and the async version.
# You can't use the synchronous version in Jupyter,
# so we'll import the async one.

import asyncio
from playwright.async_api import async_playwright

Basic setup and usage

In the Selenium example, we're going to use Webdriver Manager to automatically download the latest version of ChromeDriver for us. This is a great way to avoid having to manually download and install the driver.

Showing the browser window

Sometimes you want to see the browser while you're scraping. It's useful for debugging, and it's also useful for seeing if you're getting blocked by a CAPTCHA or something. It also feels pretty cool to watch the browser do its thing.

# Visit a page
driver = webdriver.Chrome(ChromeDriverManager().install())
driver.get("https://www.nytimes.com")
# Visit a page using chromium (could also do .firefox or .webkit)
playwright = await async_playwright().start()
browser = await playwright.chromium.launch(headless = False)
page = await browser.new_page()
await page.goto('https://www.nytimes.com')

Hiding the browser window (headless)

Running "headless" (hiding the browser window) is a good way to make your scraping faster and more efficient.

# Visit a page (hiding the browser window, aka headless)
options = webdriver.ChromeOptions()
options.add_argument('headless')
driver = webdriver.Chrome(ChromeDriverManager().install(), options=options)
driver.get("https://www.nytimes.com")
# Visit a page (hiding the browser window, aka headless)
playwright = await async_playwright().start()
browser = await playwright.chromium.launch(headless = True)
page = await browser.new_page()
await page.goto('https://www.nytimes.com')

Why does playwright have await all over the place, while Selenium doesn't?

Playwright is asynchronous, which means that it can do multiple things at once. This is great for scraping, because it means that you can do things like click a button and then wait for the page to load at the same time. However, it also means that you have to use await to tell the program to wait for the asynchronous function to finish before moving on to the next line of code. This is why you see await everywhere in the Playwright code.

Basically: put await everywhere and it will probably work.

Visiting pages

Description Selenium Playwright
Open a browser driver = webdriver.Chrome() browser = await playwright.chromium.launch()
Open a headless browser driver = webdriver.Chrome(options=options) browser = await playwright.chromium.launch(headless=True)
Visit a URL driver.get('https://www.washingtonpost.com') await page.goto('https://www.washingtonpost.com')
Wait for page to fully load n/a await page.goto("https://www.washingtonpost.com", wait_until="networkidle")
Wait for element to show up on page WebDriverWait(driver, 10).until(EC.presence_of_element_located((By.CSS_SELECTOR, 'table.results'))) await page.locator('table.results').wait_for()
Wait for a page to load (with timeout) WebDriverWait(driver, 10).until(EC.presence_of_element_located((By.CSS_SELECTOR, 'table.results'))) await page.locator('table.results', { timeout: 10000 }.wait_for()
Give the page HTML to BeautifulSoup doc = BeautifulSoup(driver.page_source, 'html.parser') doc = BeautifulSoup(await page.content(), 'html.parser')

Selecting elements on the page

Description Selenium Playwright
Find element by CSS selector driver.find_element(By.CSS_SELECTOR, 'button.submit') await page.locator('button.submit')
Selecting multiple elements by CSS selector driver.find_elements(By.CSS_SELECTOR, '.row') await page.locator('.row')
Find element by XPath driver.find_element(By.XPATH, '//button') await page.locator('//button')
Find element by complete text driver.find_element(By.LINK_TEXT, 'Click me') await page.locator('text=Click me')
Find element by partial text driver.find_element(By.PARTIAL_LINK_TEXT, 'Click me') await page.locator('a:has-text("Click me")')
Find element by partial text in href attribute driver.find_element(By.CSS_SELECTOR, 'a[href*="url-to-somewhere"]') await page.locator('a[href*="url-to-somewhere"]')

Interacting with the page

Description Selenium Playwright
Click a button driver.find_element(By.CSS_SELECTOR, 'button').click() await page.click('button') or await button.click()
Fill a form driver.find_element(By.CSS_SELECTOR, 'input.name').send_keys('My name') await page.fill('input.name', 'My name')
Select an option Select(driver.find_element(By.CSS_SELECTOR, 'select#company')).select_by_value('Pigeons LLC') await page.select_option('select#company', 'Pigeons LLC')
Switching to a newly opened tab driver.switch_to.window(driver.window_handles[-1]) it depends

The Playwright docs have a great page on "multi-page scenarios," include handling new pages and popups

Closing the browser

Description Selenium Playwright
Close the browser driver.quit() await browser.close()