Useful selectors for scraping
CSS selectors
CSS selector example |
What it means |
div |
Select all <div> elements |
div.story-wrapper |
Select all <div> elements with the class story-wrapper |
.story-wrapper |
Select all elements with the class story-wrapper |
#content .story-wrapper |
Select all elements with the class story-wrapper that are inside of something with the id of content |
div.story-wrapper > p |
Select all <p> elements that are inside of a <div> with the class story-wrapper |
table.minutes-table tbody tr |
Select all <tr> elements that are inside of a <tbody> that is inside of a <table> with the class minutes-table |
h3 |
Select all <h3> elements |
h3.story-title |
Select all <h3> elements with the class story-title |
a[href] |
Select all <a> elements that have an href attribute |
a[href$="pdf"] |
Select all <a> elements that have an href attribute that ends with pdf |
a[href*="2002"] |
Select all <a> elements that have an href attribute that includes the text 2002 |
td:nth-child(3) |
Select the third <td> element in a row |
Tip: When using the nth-child
selector, the first element is nth-child(1)
, not nth-child(0)
.
Using CSS selectors
Tool |
Selecting one or many? |
Example code |
BeautifulSoup |
One |
soup.select_one('h3.title') |
BeautifulSoup |
Many |
soup.select('.story-wrapper') |
Selenium |
One |
driver.find_element(By.CSS_SELECTOR, 'h3.title') |
Selenium |
Many |
driver.find_elements(By.CSS_SELECTOR, '.story-wrapper') |
Playwright |
One |
page.locator('h3.title').first |
Playwright |
Many |
page.locator('.story-wrapper') |
XPath selectors
XPath selector example |
What it means |
//div |
Select all <div> elements |
//div[@class="story-wrapper"] |
Select all <div> elements with the class story-wrapper |
//div[@class="story-wrapper"]/p |
Select all <p> elements that are inside of a <div> with the class story-wrapper |
//table[@class="minutes-table"]/tbody/tr |
Select all <tr> elements that are inside of a <tbody> that is inside of a <table> with the class minutes-table |
//h3 |
Select all <h3> elements |
//h3[@class="story-title"] |
Select all <h3> elements with the class story-title |
//a[@href] |
Select all <a> elements that have an href attribute |
//a[ends-with(@href, "pdf")] |
Select all <a> elements that have an href attribute that ends with the text pdf |
//a[contains(@href, "2002")] |
Select all <a> elements that have an href attribute that includes the text 2002 |
//td[3] |
Select the third <td> element in a row |
Using XPath selectors
Tool |
Selecting one or many? |
Example code |
Selenium |
One |
driver.find_element(By.XPATH, '//h3[@class="title"]') |
Selenium |
Many |
driver.find_elements(By.XPATH, '//div[@class="story-wrapper"]') |
Playwright |
One |
page.locator('//h3[@class="title"]').first |
Playwright |
Many |
page.locator('//div[@class="story-wrapper"]') |