Selenium-Playwright scraping command conversion reference¶

I've been using Selenium for automated scraping of interactive websites for hundreds if not thousands of years, but Playwright seems pretty good. Let's build a quick reference to compare the two.

Is Playwright better than Selenium?¶

Playwright is newer than Selenium, and oftentimes has better documentation. On the other hand, it's built for JavaScript and its Python usage is a little awkward compared to "normal" Python code.

Installation¶

Installation is slightly more difficult for Selenium, in that you need to install Selenium, a browser, and a webdriver, which is what talks to the browser. Playwright doesn't need a separate webdriver, but it does need a browser.

SeleniumPlaywright

pip install selenium
pip install webdriver-manager

pip install playwright

Basic imports¶

Selenium has approximately ten million imports. You don't necessarily need all of them, but if you're using dropdowns and waiting for the page to load and blah blah all sorts of particular things, they add up quickly.

Playwright has maybe two imports.

SeleniumPlaywright

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.support.ui import Select
from webdriver_manager.chrome import ChromeDriverManager

# There are two versions of Playwright:
# the synchronous version and the async version.
# You can't use the synchronous version in Jupyter,
# so we'll import the async one.

import asyncio
from playwright.async_api import async_playwright

Basic setup and usage¶

In the Selenium example, we're going to use Webdriver Manager to automatically download the latest version of ChromeDriver for us. This is a great way to avoid having to manually download and install the driver.

Showing the browser window¶

Sometimes you want to see the browser while you're scraping. It's useful for debugging, and it's also useful for seeing if you're getting blocked by a CAPTCHA or something. It also feels pretty cool to watch the browser do its thing.

SeleniumPlaywright

# Visit a page
driver = webdriver.Chrome(ChromeDriverManager().install())
driver.get("https://www.nytimes.com")

# Visit a page using chromium (could also do .firefox or .webkit)
playwright = await async_playwright().start()
browser = await playwright.chromium.launch(headless = False)
page = await browser.new_page()
await page.goto('https://www.nytimes.com')

Hiding the browser window (headless)¶

Running "headless" (hiding the browser window) is a good way to make your scraping faster and more efficient.

SeleniumPlaywright

# Visit a page (hiding the browser window, aka headless)
options = webdriver.ChromeOptions()
options.add_argument('headless')
driver = webdriver.Chrome(ChromeDriverManager().install(), options=options)
driver.get("https://www.nytimes.com")

# Visit a page (hiding the browser window, aka headless)
playwright = await async_playwright().start()
browser = await playwright.chromium.launch(headless = True)
page = await browser.new_page()
await page.goto('https://www.nytimes.com')

Why does playwright have await all over the place, while Selenium doesn't?

Playwright is asynchronous, which means that it can do multiple things at once. This is great for scraping, because it means that you can do things like click a button and then wait for the page to load at the same time. However, it also means that you have to use await to tell the program to wait for the asynchronous function to finish before moving on to the next line of code. This is why you see await everywhere in the Playwright code.

Basically: put await everywhere and it will probably work.

Visiting pages¶

Description	Selenium	Playwright
Open a browser	`driver = webdriver.Chrome()`	`browser = await playwright.chromium.launch()`
Open a headless browser	`driver = webdriver.Chrome(options=options)`	`browser = await playwright.chromium.launch(headless=True)`
Visit a URL	`driver.get('https://www.washingtonpost.com')`	`await page.goto('https://www.washingtonpost.com')`
Wait for page to fully load	`n/a`	`await page.goto("https://www.washingtonpost.com", wait_until="networkidle")`
Wait for element to show up on page	`WebDriverWait(driver, 10).until(EC.presence_of_element_located((By.CSS_SELECTOR, 'table.results')))`	`await page.locator('table.results').wait_for()`
Wait for a page to load (with timeout)	`WebDriverWait(driver, 10).until(EC.presence_of_element_located((By.CSS_SELECTOR, 'table.results')))`	`await page.locator('table.results', { timeout: 10000 }.wait_for()`
Give the page HTML to BeautifulSoup	`doc = BeautifulSoup(driver.page_source, 'html.parser')`	`doc = BeautifulSoup(await page.content(), 'html.parser')`

Selecting elements on the page¶

Description	Selenium	Playwright
Find element by CSS selector	`driver.find_element(By.CSS_SELECTOR, 'button.submit')`	`await page.locator('button.submit')`
Selecting multiple elements by CSS selector	`driver.find_elements(By.CSS_SELECTOR, '.row')`	`await page.locator('.row')`
Find element by XPath	`driver.find_element(By.XPATH, '//button')`	`await page.locator('//button')`
Find element by complete text	`driver.find_element(By.LINK_TEXT, 'Click me')`	`await page.locator('text=Click me')`
Find element by partial text	`driver.find_element(By.PARTIAL_LINK_TEXT, 'Click me')`	`await page.locator('a:has-text("Click me")')`
Find element by partial text in `href` attribute	`driver.find_element(By.CSS_SELECTOR, 'a[href*="url-to-somewhere"]')`	`await page.locator('a[href*="url-to-somewhere"]')`

Interacting with the page¶

Description	Selenium	Playwright
Click a button	`driver.find_element(By.CSS_SELECTOR, 'button').click()`	`await page.click('button')` or `await button.click()`
Fill a form	`driver.find_element(By.CSS_SELECTOR, 'input.name').send_keys('My name')`	`await page.fill('input.name', 'My name')`
Select an option	`Select(driver.find_element(By.CSS_SELECTOR, 'select#company')).select_by_value('Pigeons LLC')`	`await page.select_option('select#company', 'Pigeons LLC')`
Switching to a newly opened tab	`driver.switch_to.window(driver.window_handles[-1])`	it depends

The Playwright docs have a great page on "multi-page scenarios," include handling new pages and popups

Closing the browser¶

Description	Selenium	Playwright
Close the browser	`driver.quit()`	`await browser.close()`