Skip to content

Useful selectors for scraping

CSS selectors

CSS selector example What it means
div Select all <div> elements
div.story-wrapper Select all <div> elements with the class story-wrapper
.story-wrapper Select all elements with the class story-wrapper
#content .story-wrapper Select all elements with the class story-wrapper that are inside of something with the id of content
div.story-wrapper > p Select all <p> elements that are inside of a <div> with the class story-wrapper
table.minutes-table tbody tr Select all <tr> elements that are inside of a <tbody> that is inside of a <table> with the class minutes-table
h3 Select all <h3> elements
h3.story-title Select all <h3> elements with the class story-title
a[href] Select all <a> elements that have an href attribute
a[href$="pdf"] Select all <a> elements that have an href attribute that ends with pdf
a[href*="2002"] Select all <a> elements that have an href attribute that includes the text 2002
td:nth-child(3) Select the third <td> element in a row

Tip: When using the nth-child selector, the first element is nth-child(1), not nth-child(0).

Using CSS selectors

Tool Selecting one or many? Example code
BeautifulSoup One soup.select_one('h3.title')
BeautifulSoup Many soup.select('.story-wrapper')
Selenium One driver.find_element(By.CSS_SELECTOR, 'h3.title')
Selenium Many driver.find_elements(By.CSS_SELECTOR, '.story-wrapper')
Playwright One page.locator('h3.title').first
Playwright Many page.locator('.story-wrapper')

XPath selectors

XPath selector example What it means
//div Select all <div> elements
//div[@class="story-wrapper"] Select all <div> elements with the class story-wrapper
//div[@class="story-wrapper"]/p Select all <p> elements that are inside of a <div> with the class story-wrapper
//table[@class="minutes-table"]/tbody/tr Select all <tr> elements that are inside of a <tbody> that is inside of a <table> with the class minutes-table
//h3 Select all <h3> elements
//h3[@class="story-title"] Select all <h3> elements with the class story-title
//a[@href] Select all <a> elements that have an href attribute
//a[ends-with(@href, "pdf")] Select all <a> elements that have an href attribute that ends with the text pdf
//a[contains(@href, "2002")] Select all <a> elements that have an href attribute that includes the text 2002
//td[3] Select the third <td> element in a row

Using XPath selectors

Tool Selecting one or many? Example code
Selenium One driver.find_element(By.XPATH, '//h3[@class="title"]')
Selenium Many driver.find_elements(By.XPATH, '//div[@class="story-wrapper"]')
Playwright One page.locator('//h3[@class="title"]').first
Playwright Many page.locator('//div[@class="story-wrapper"]')