Useful selectors for scraping¶

CSS selectors¶

CSS selector example	What it means
`div`	Select all `<div>` elements
`div.story-wrapper`	Select all `<div>` elements with the class `story-wrapper`
`.story-wrapper`	Select all elements with the class `story-wrapper`
`#content .story-wrapper`	Select all elements with the class `story-wrapper` that are inside of something with the id of `content`
`div.story-wrapper > p`	Select all `<p>` elements that are inside of a `<div>` with the class `story-wrapper`
`table.minutes-table tbody tr`	Select all `<tr>` elements that are inside of a `<tbody>` that is inside of a `<table>` with the class `minutes-table`
`h3`	Select all `<h3>` elements
`h3.story-title`	Select all `<h3>` elements with the class `story-title`
`a[href]`	Select all `<a>` elements that have an `href` attribute
`a[href$="pdf"]`	Select all `<a>` elements that have an `href` attribute that ends with `pdf`
`a[href*="2002"]`	Select all `<a>` elements that have an `href` attribute that includes the text 2002
`td:nth-child(3)`	Select the third `<td>` element in a row

Tip: When using the nth-child selector, the first element is nth-child(1), not nth-child(0).

Tool	Selecting one or many?	Example code
BeautifulSoup	One	`soup.select_one('h3.title')`
BeautifulSoup	Many	`soup.select('.story-wrapper')`
Selenium	One	`driver.find_element(By.CSS_SELECTOR, 'h3.title')`
Selenium	Many	`driver.find_elements(By.CSS_SELECTOR, '.story-wrapper')`
Playwright	One	`page.locator('h3.title').first`
Playwright	Many	`page.locator('.story-wrapper')`

XPath selector example	What it means
`//div`	Select all `<div>` elements
`//div[@class="story-wrapper"]`	Select all `<div>` elements with the class `story-wrapper`
`//div[@class="story-wrapper"]/p`	Select all `<p>` elements that are inside of a `<div>` with the class `story-wrapper`
`//table[@class="minutes-table"]/tbody/tr`	Select all `<tr>` elements that are inside of a `<tbody>` that is inside of a `<table>` with the class `minutes-table`
`//h3`	Select all `<h3>` elements
`//h3[@class="story-title"]`	Select all `<h3>` elements with the class `story-title`
`//a[@href]`	Select all `<a>` elements that have an `href` attribute
`//a[ends-with(@href, "pdf")]`	Select all `<a>` elements that have an `href` attribute that ends with the text `pdf`
`//a[contains(@href, "2002")]`	Select all `<a>` elements that have an `href` attribute that includes the text 2002
`//td[3]`	Select the third `<td>` element in a row

Tool	Selecting one or many?	Example code
Selenium	One	`driver.find_element(By.XPATH, '//h3[@class="title"]')`
Selenium	Many	`driver.find_elements(By.XPATH, '//div[@class="story-wrapper"]')`
Playwright	One	`page.locator('//h3[@class="title"]').first`
Playwright	Many	`page.locator('//div[@class="story-wrapper"]')`