The Spotify API¶
In this notebook you'll be using Spotipy, a Python package, to talk to the Spotify API. This means you won't have to manually create API URLs, you'll just need to figure out how to make Spotipy do it for you! The full Spotipy documentation is available at https://spotipy.readthedocs.io/
To access public Spotify data¶
You'll want to go to the Spotify for Developers Dashboard and create a new app. This will give you a client_id
and client_secret
! It's like a super-advanced version of an API key. When you're setting up your app it will probably also ask you for other things like a redirect URL - just put whatever you want in there, it doesn't matter. If it asks what you want access to, you can pick the Web API (but I don't think it matters).
The code below won't work since it's my secret keys. I've deleted them so that this notebook is nice and safe for me!
from spotipy.oauth2 import SpotifyClientCredentials
sp = spotipy.Spotify(auth_manager=SpotifyClientCredentials(
client_id='62a0cb479d8246ec91d207e045f1b69e',
client_secret='b3addc96a269419aad009bc7b92f56fe',
))
When you want data from Spotify, you can't just go to /artists/Pixies
in order to get work by Pixies! You have to find a special code for the artist (or song, or album, or whatever). It's called the uri
.
You can find more details on searching on the Spotipy documentation or the Spotify Web API documentation. Remember that Spotipy is a Python wrapper for the Spotify API, so you don't need to work with any URLs!
To find the uri
, you first need to do a search. Below we use sp.search
to search for a particular artist.
# Search for the artist Pixies
results = sp.search(q='artist:Pixies', type='artist')
The results
it shows us is awful and long and terrible. Instead of showing you how to do that, I already poked through it and found the top artist result from our search.
results['artists']['items'][0]
{'external_urls': {'spotify': 'https://open.spotify.com/artist/6zvul52xwTWzilBZl6BUbT'}, 'followers': {'href': None, 'total': 2743287}, 'genres': ['alternative rock', 'boston rock', 'permanent wave', 'rock'], 'href': 'https://api.spotify.com/v1/artists/6zvul52xwTWzilBZl6BUbT', 'id': '6zvul52xwTWzilBZl6BUbT', 'images': [{'height': 640, 'url': 'https://i.scdn.co/image/ab6761610000e5ebd0456128dd330d18e18b4715', 'width': 640}, {'height': 320, 'url': 'https://i.scdn.co/image/ab67616100005174d0456128dd330d18e18b4715', 'width': 320}, {'height': 160, 'url': 'https://i.scdn.co/image/ab6761610000f178d0456128dd330d18e18b4715', 'width': 160}], 'name': 'Pixies', 'popularity': 68, 'type': 'artist', 'uri': 'spotify:artist:6zvul52xwTWzilBZl6BUbT'}
There we go! The uri
looks to be spotify:artist:6zvul52xwTWzilBZl6BUbT
.
Now the sad part: the Spotipy documentation is...... not great. The Spotify Web API docs look good, but we're using the Python wrapper, not the raw Spotify API! Luckily Spotipy has a great list of examples, including one for an artist's top tracks.
from spotipy.oauth2 import SpotifyClientCredentials
import spotipy
lz_uri = 'spotify:artist:36QJpDe2go2KgaRleHCDTp'
client_credentials_manager = SpotifyClientCredentials()
sp = spotipy.Spotify(client_credentials_manager=client_credentials_manager)
results = sp.artist_top_tracks(lz_uri)
for track in results['tracks'][:10]:
print('track : ' + track['name'])
print('audio : ' + track['preview_url'])
print('cover art: ' + track['album']['images'][0]['url'])
Since we already have the credentials and blah blah blah set up, all we need to do is adapt the sp.artist_top_tracks(lz_uri)
line and everything below it.
results = sp.artist_top_tracks('spotify:artist:6zvul52xwTWzilBZl6BUbT')
for track in results['tracks'][:10]:
print(track['name'])
Where Is My Mind? - Remastered Here Comes Your Man Hey All I Think About Now Monkey Gone to Heaven The Thing Debaser Gouge Away Wave Of Mutilation Ana
and after all of this ...it turns out you can't get the play count! Instead, you need to get the popularity.
That might be fine by you, but if you want more scroll to the bottom of the page.
To access user data¶
Sometimes when using Spotify you want to know things about the user on the computer: for example, their saved tracks, their playlists, or more. To do this you need to ask for special permissions (or the scope
). In the example below we use user-library-read
, which allows you to look at someone's saved tracks.
You can find what information can be retrieved about a user by scrolling around on the Spotify Web API documentation and looking for Get Current User's XXXXXX.
Some of them will probably need permissions that aren't user-library-read
, you can look at Spotify Web API's list of scopes to see what other scopes might be required.
Ask for permission to look at the user's information¶
You'll need to get a client_id
and client_secret
from the Spotify for Developers Dashboard. You can use anything for the redirect_uri
– fancy websites will use this to direct users back to their site, but you're just running this code on your own computer.
from spotipy.oauth2 import SpotifyOAuth
scope='user-library-read'
oauth = SpotifyOAuth(
client_id='62a0cb479d8246ec91d207e045f1b69e',
client_secret='b3addc96a269419aad009bc7b92f56fe',
redirect_uri='https://localhost:8080',
scope=scope
)
sp = spotipy.Spotify(auth_manager=oauth)
Let's see what user we're logged in as!
This line will start you talking to the Spotify API, logging in, accepting permissions, etc.
sp.me()
{'display_name': 'Jonathan Soma', 'external_urls': {'spotify': 'https://open.spotify.com/user/1244081439'}, 'href': 'https://api.spotify.com/v1/users/1244081439', 'id': '1244081439', 'images': [{'url': 'https://scontent-atl3-1.xx.fbcdn.net/v/t1.18169-1/47228_10100660458008666_1070103148_n.jpg?stp=c21.21.261.261a_cp0_dst-jpg_s50x50&_nc_cat=110&ccb=1-7&_nc_sid=db1b99&_nc_ohc=aqFtkG4FEqgAX9AJOQD&_nc_ht=scontent-atl3-1.xx&edm=AP4hL3IEAAAA&oh=00_AfA_wbA2herQNmD_A7lVR6I_7njCWVMnyfV47jVPb53H5Q&oe=65954A13', 'height': 64, 'width': 64}, {'url': 'https://scontent-atl3-1.xx.fbcdn.net/v/t1.18169-1/47228_10100660458008666_1070103148_n.jpg?stp=c21.21.261.261a_dst-jpg&_nc_cat=110&ccb=1-7&_nc_sid=0be577&_nc_ohc=aqFtkG4FEqgAX9AJOQD&_nc_ht=scontent-atl3-1.xx&edm=AP4hL3IEAAAA&oh=00_AfBcxBA9v2Am5H08ccBoQiwpAdDgiI1oEOlGRgBrCTEkRQ&oe=65954A13', 'height': 300, 'width': 300}], 'type': 'user', 'uri': 'spotify:user:1244081439', 'followers': {'href': None, 'total': 31}}
Getting data from the user¶
In the example below, we get 20 saved tracks from the logged-in user. We are allowed to do this because we provided the user-library-read
scope up above.
tracks = sp.current_user_saved_tracks(limit=20)
tracks
Okay, so how do we know what to do now?
Right now you're using Spotipy, a Python wrapper for the Spotify Web API. You should use the Spotipy documentation for all of your code examples, but the Spotify Web API documentation might inspire you on different ways to use the Spotipy library. Below you can compare the documentation for getting the current user's saved tracks:
- Spotipy documentation: https://spotipy.readthedocs.io/en/2.22.1/?highlight=current_user_saved_tracks#spotipy.client.Spotify.current_user_saved_tracks
- Spotify Web API documentation: https://developer.spotify.com/documentation/web-api/reference/get-users-saved-tracks
In order to get all of the saved tracks, I recommend checking out my YouTube video on how to use paginated APIs at https://www.youtube.com/watch?v=4Fdyft-ky0w. Your code is going to be a little different since your aren't using requests.get
and a URL to get the information from the API – instead you are using sp.current_user_saved_tracks
.
You can also start with just the 20 there (or 50, if you change the limit
) if you'd rather. To get tracks, you'll use limit
, offset
and range
.
Accessing non-API data¶
The public API data doesn't include play counts for individual songs, only popularity. But when you visit Spotify on the web you can see it! So we'll need to scrape it.
It's definitely a fancy web application and not a simple website, so we can't use BeautifulSoup. Instead we'll fire up Playwright...
from playwright.async_api import async_playwright
playwright = await async_playwright().start()
browser = await playwright.chromium.launch(headless = False)
page = await browser.new_page()
...visit the page in the browser...
await page.goto('https://open.spotify.com/artist/6zvul52xwTWzilBZl6BUbT')
<Response url='https://open.spotify.com/artist/6zvul52xwTWzilBZl6BUbT' request=<Request url='https://open.spotify.com/artist/6zvul52xwTWzilBZl6BUbT' method='GET'>>
and feed the contents into BeautifulSoup.
from bs4 import BeautifulSoup
doc = BeautifulSoup(await page.content())
Now that it's in BeautifulSoup, we need to find something to identify each one of these rows...
Selecting this way instead of just grabbing the titles will allow us to keep titles and play count for the same songs together.
# You'd need to fill in the ??????????
top_songs = doc.find_all(??????????)
len(top_songs)
5
Once we find that, we'll need to find a class for the song titles and the play counts.
Note: Remember that when it says
class="something ABC123 lkfm23f"
that is a class ofsomething
and the class ofABC123
and the class oflkfm23f
. So you can just use.find(class_='ABC123')
and it will work fine!
# You need to fill in the ??????????
for song in top_songs:
title = song.find(??????????).text
play_count = song.find(??????????).text
print(title, play_count)
Where Is My Mind? - Remastered 787,853,618 Here Comes Your Man 197,701,666 Hey 165,702,038 All I Think About Now 57,576,177 Monkey Gone to Heaven 80,559,397
Finally, to wrap it all up, we need to edit the code above to create a list of dictionaries that we can put into a pandas dataframe and save to a CSV.