Sometimes when you’re dealing with an API, it doesn’t give you all of the results it knows about.
For example, let’s use the Star Wars API to search for
everyone with the letter a
in their name.
{'count': 60,
'next': 'https://swapi.co/api/people/?search=a&page=2',
'previous': None,
'results': [{'birth_year': '19BBY',
'created': '2014-12-09T13:50:51.644000Z',
'edited': '2014-12-20T21:17:56.891000Z',
'eye_color': 'blue',
'films': ['https://swapi.co/api/films/2/',
'https://swapi.co/api/films/6/',
'https://swapi.co/api/films/3/',
'https://swapi.co/api/films/1/',
'https://swapi.co/api/films/7/'],
'gender': 'male',
'hair_color': 'blond',
'height': '172',
'homeworld': 'https://swapi.co/api/planets/1/',
'mass': '77',
'name': 'Luke Skywalker',
'skin_color': 'fair',
'species': ['https://swapi.co/api/species/1/'],
'starships': ['https://swapi.co/api/starships/12/',
'https://swapi.co/api/starships/22/'],
'url': 'https://swapi.co/api/people/1/',
'vehicles': ['https://swapi.co/api/vehicles/14/',
'https://swapi.co/api/vehicles/30/']},
{'birth_year': '41.9BBY',
'created': '2014-12-10T15:18:20.704000Z',
'edited': '2014-12-20T21:17:50.313000Z',
'eye_color': 'yellow',
'films': ['https://swapi.co/api/films/2/',
'https://swapi.co/api/films/6/',
'https://swapi.co/api/films/3/',
'https://swapi.co/api/films/1/'],
'gender': 'male',
'hair_color': 'none',
'height': '202',
'homeworld': 'https://swapi.co/api/planets/1/',
'mass': '136',
'name': 'Darth Vader',
'skin_color': 'white',
'species': ['https://swapi.co/api/species/1/'],
'starships': ['https://swapi.co/api/starships/13/'],
'url': 'https://swapi.co/api/people/4/',
'vehicles': []},
{'birth_year': '19BBY',
'created': '2014-12-10T15:20:09.791000Z',
'edited': '2014-12-20T21:17:50.315000Z',
'eye_color': 'brown',
'films': ['https://swapi.co/api/films/2/',
'https://swapi.co/api/films/6/',
'https://swapi.co/api/films/3/',
'https://swapi.co/api/films/1/',
'https://swapi.co/api/films/7/'],
'gender': 'female',
'hair_color': 'brown',
'height': '150',
'homeworld': 'https://swapi.co/api/planets/2/',
'mass': '49',
'name': 'Leia Organa',
'skin_color': 'light',
'species': ['https://swapi.co/api/species/1/'],
'starships': [],
'url': 'https://swapi.co/api/people/5/',
'vehicles': ['https://swapi.co/api/vehicles/30/']},
{'birth_year': '52BBY',
'created': '2014-12-10T15:52:14.024000Z',
'edited': '2014-12-20T21:17:50.317000Z',
'eye_color': 'blue',
'films': ['https://swapi.co/api/films/5/',
'https://swapi.co/api/films/6/',
'https://swapi.co/api/films/1/'],
'gender': 'male',
'hair_color': 'brown, grey',
'height': '178',
'homeworld': 'https://swapi.co/api/planets/1/',
'mass': '120',
'name': 'Owen Lars',
'skin_color': 'light',
'species': ['https://swapi.co/api/species/1/'],
'starships': [],
'url': 'https://swapi.co/api/people/6/',
'vehicles': []},
{'birth_year': '47BBY',
'created': '2014-12-10T15:53:41.121000Z',
'edited': '2014-12-20T21:17:50.319000Z',
'eye_color': 'blue',
'films': ['https://swapi.co/api/films/5/',
'https://swapi.co/api/films/6/',
'https://swapi.co/api/films/1/'],
'gender': 'female',
'hair_color': 'brown',
'height': '165',
'homeworld': 'https://swapi.co/api/planets/1/',
'mass': '75',
'name': 'Beru Whitesun lars',
'skin_color': 'light',
'species': ['https://swapi.co/api/species/1/'],
'starships': [],
'url': 'https://swapi.co/api/people/7/',
'vehicles': []},
{'birth_year': '24BBY',
'created': '2014-12-10T15:59:50.509000Z',
'edited': '2014-12-20T21:17:50.323000Z',
'eye_color': 'brown',
'films': ['https://swapi.co/api/films/1/'],
'gender': 'male',
'hair_color': 'black',
'height': '183',
'homeworld': 'https://swapi.co/api/planets/1/',
'mass': '84',
'name': 'Biggs Darklighter',
'skin_color': 'light',
'species': ['https://swapi.co/api/species/1/'],
'starships': ['https://swapi.co/api/starships/12/'],
'url': 'https://swapi.co/api/people/9/',
'vehicles': []},
{'birth_year': '57BBY',
'created': '2014-12-10T16:16:29.192000Z',
'edited': '2014-12-20T21:17:50.325000Z',
'eye_color': 'blue-gray',
'films': ['https://swapi.co/api/films/2/',
'https://swapi.co/api/films/5/',
'https://swapi.co/api/films/4/',
'https://swapi.co/api/films/6/',
'https://swapi.co/api/films/3/',
'https://swapi.co/api/films/1/'],
'gender': 'male',
'hair_color': 'auburn, white',
'height': '182',
'homeworld': 'https://swapi.co/api/planets/20/',
'mass': '77',
'name': 'Obi-Wan Kenobi',
'skin_color': 'fair',
'species': ['https://swapi.co/api/species/1/'],
'starships': ['https://swapi.co/api/starships/48/',
'https://swapi.co/api/starships/59/',
'https://swapi.co/api/starships/64/',
'https://swapi.co/api/starships/65/',
'https://swapi.co/api/starships/74/'],
'url': 'https://swapi.co/api/people/10/',
'vehicles': ['https://swapi.co/api/vehicles/38/']},
{'birth_year': '41.9BBY',
'created': '2014-12-10T16:20:44.310000Z',
'edited': '2014-12-20T21:17:50.327000Z',
'eye_color': 'blue',
'films': ['https://swapi.co/api/films/5/',
'https://swapi.co/api/films/4/',
'https://swapi.co/api/films/6/'],
'gender': 'male',
'hair_color': 'blond',
'height': '188',
'homeworld': 'https://swapi.co/api/planets/1/',
'mass': '84',
'name': 'Anakin Skywalker',
'skin_color': 'fair',
'species': ['https://swapi.co/api/species/1/'],
'starships': ['https://swapi.co/api/starships/59/',
'https://swapi.co/api/starships/65/',
'https://swapi.co/api/starships/39/'],
'url': 'https://swapi.co/api/people/11/',
'vehicles': ['https://swapi.co/api/vehicles/44/',
'https://swapi.co/api/vehicles/46/']},
{'birth_year': '64BBY',
'created': '2014-12-10T16:26:56.138000Z',
'edited': '2014-12-20T21:17:50.330000Z',
'eye_color': 'blue',
'films': ['https://swapi.co/api/films/6/', 'https://swapi.co/api/films/1/'],
'gender': 'male',
'hair_color': 'auburn, grey',
'height': '180',
'homeworld': 'https://swapi.co/api/planets/21/',
'mass': 'unknown',
'name': 'Wilhuff Tarkin',
'skin_color': 'fair',
'species': ['https://swapi.co/api/species/1/'],
'starships': [],
'url': 'https://swapi.co/api/people/12/',
'vehicles': []},
{'birth_year': '200BBY',
'created': '2014-12-10T16:42:45.066000Z',
'edited': '2014-12-20T21:17:50.332000Z',
'eye_color': 'blue',
'films': ['https://swapi.co/api/films/2/',
'https://swapi.co/api/films/6/',
'https://swapi.co/api/films/3/',
'https://swapi.co/api/films/1/',
'https://swapi.co/api/films/7/'],
'gender': 'male',
'hair_color': 'brown',
'height': '228',
'homeworld': 'https://swapi.co/api/planets/14/',
'mass': '112',
'name': 'Chewbacca',
'skin_color': 'unknown',
'species': ['https://swapi.co/api/species/3/'],
'starships': ['https://swapi.co/api/starships/10/',
'https://swapi.co/api/starships/22/'],
'url': 'https://swapi.co/api/people/13/',
'vehicles': ['https://swapi.co/api/vehicles/19/']}]}
It looks like a lot of stuff, but let’s examine it a little more closely. How many results is it, really?
60
Okay, cool, 60 results! Let’s loop through them.
Luke Skywalker
Darth Vader
Leia Organa
Owen Lars
Beru Whitesun lars
Biggs Darklighter
Obi-Wan Kenobi
Anakin Skywalker
Wilhuff Tarkin
Chewbacca
Wait a second, that’s not 60 people! It’s… a lot less.
10
It’s… it’s 10! How do we only have 10 results if data['count']
says we
should have 60?
Pagination in an API
Most APIs that allow you to search only return some of the results at a
time. In this case, you get 10 results at a time, even though there are 60
total. But, to be helpful, the API comes with a next
key that tells you where
to find more.
https://swapi.co/api/people/?search=a&page=2
All we need to do to get page 2 is to make a request to that page…
Han Solo
Jabba Desilijic Tiure
Wedge Antilles
Yoda
Palpatine
Boba Fett
Lando Calrissian
Ackbar
Mon Mothma
Arvel Crynyd
…and we get everyone who is on that second page.
Remember how our data['next']
on page 1 gave us the URL to page 2? On page 2,
data['next']
will also point to the next page, page 3.
https://swapi.co/api/people/?search=a&page=3
If we keep going and going and going, eventually the next
page doesn’t exist
any more. In this case, it happens on page 6.
None
When data['next']
is None
, we’re finally at the end.
How does this work when getting data from an API, though? Are we supposed to keep changing the page number time after time by hand?
No!
There’s an easier way.
Scraping all of the pages at once
Technically, there are two easier ways to do this, not just one. The first
way involves a cool new kind of loop called a while
loop, while the second
uses a normal for
loop.
METHOD ONE: while
loop
A while
loop is kind of like an if
statement. For example, maybe we’re
wondering if we need to get a second page of results:
Downloading the original search results
Next page found, downloading https://swapi.co/api/people/?search=a&page=2
The way a while
loop works is that it keeps doing something until the
statement is False
. if
does something once, and while
does something
forever (maybe).
So in this case, it’s going to keep downloading pages as long as data['next']
is not None
. In other words, it will only stop when data['next']
is
empty.
Let’s change our if
to while
:
Downloading the original search results
Next page found, downloading https://swapi.co/api/people/?search=a&page=2
Next page found, downloading https://swapi.co/api/people/?search=a&page=3
Next page found, downloading https://swapi.co/api/people/?search=a&page=4
Next page found, downloading https://swapi.co/api/people/?search=a&page=5
Next page found, downloading https://swapi.co/api/people/?search=a&page=6
We just need one small change - let’s make an empty list of total_results
and
keep adding data['results']
to it each time.
Downloading the original search results
Next page found, downloading https://swapi.co/api/people/?search=a&page=2
Next page found, downloading https://swapi.co/api/people/?search=a&page=3
Next page found, downloading https://swapi.co/api/people/?search=a&page=4
Next page found, downloading https://swapi.co/api/people/?search=a&page=5
Next page found, downloading https://swapi.co/api/people/?search=a&page=6
We have 60 total results
METHOD TWO: for
loop and range
I think while
loops can be trouble because if you write them wrong, your
program might run forever! This is pretty bad!
If you know how many pages you need to go through, though, you can use a for
loop instead.
In this case, we know we need to get everything between page 1 and page 6.
https://swapi.co/api/people/?search=a&page=1
https://swapi.co/api/people/?search=a&page=2
https://swapi.co/api/people/?search=a&page=3
https://swapi.co/api/people/?search=a&page=4
https://swapi.co/api/people/?search=a&page=5
https://swapi.co/api/people/?search=a&page=6
A boring way to do this is to make a list of numbers, and loop through it.
https://swapi.co/api/people/?search=a&page=1
https://swapi.co/api/people/?search=a&page=2
https://swapi.co/api/people/?search=a&page=3
https://swapi.co/api/people/?search=a&page=4
https://swapi.co/api/people/?search=a&page=5
https://swapi.co/api/people/?search=a&page=6
If that’s too much typing, Python can also help out. The range
function will
automatically build the list for you.
range(6)
will give you [0, 1, 2, 3, 4, 5]
, so you can either do + 1
on
that or range(1,7)
to get [1, 2, 3, 4, 5, 6]
.
https://swapi.co/api/people/?search=a&page=1
https://swapi.co/api/people/?search=a&page=2
https://swapi.co/api/people/?search=a&page=3
https://swapi.co/api/people/?search=a&page=4
https://swapi.co/api/people/?search=a&page=5
https://swapi.co/api/people/?search=a&page=6
Once you have all of the pages, you can do what we did before - each time through the loop, request the page and take the results.
Downloading https://swapi.co/api/people/?search=a&page=1
Downloading https://swapi.co/api/people/?search=a&page=2
Downloading https://swapi.co/api/people/?search=a&page=3
Downloading https://swapi.co/api/people/?search=a&page=4
Downloading https://swapi.co/api/people/?search=a&page=5
Downloading https://swapi.co/api/people/?search=a&page=6
We have 60 total results
This might be easier to read, but there’s one problem: how do you know you have 6 pages? Honestly, nothing automatic - you probably manually get the first page, then calculate how many pages it is. It’s a little more work, but if it makes more sense, go for it.