%load_ext dotenv
%dotenv
Hi, I’m Soma! You can find me on email at jonathan.soma@gmail.com, on Twitter at @dangerscarf, or maybe even on this newsletter I’ve never sent.
The secret world of undocumented APIs
While scraping a website is all good and fun, sometimes there’s a better/faster/cheaper way to get your data: undocumented APIs!
An API is how two computers talk to each other without the ugliness of the web getting involved. To be overly simplistic, instead of browsing a site like a normal human being, your computer visits a special URL to search or download data in bulk. For example, Twitter’s API is how researchers got millions and millions of tweets before Musk locked ’em all out.
While many APIs are public-facing and advertised, some are semi-secret or unofficial. For example, when you visit Pitchfork and search for music reviews, your browser secretly visits this URL to find the data that it eventually displays on the page. That listing of data – the API “endpoint” – is all about computers reading it, not people, so it looks awful and ugly and very much like this:
"count":10000,"previous":null,"next":null,"results":{"category":null,"list":[{"dek":"<p>The pop trio’s new single casts the iconic 24-hour breakfast chain as a place of conversation and healing, not fights and thrown chairs.</p>\n","seoDescription":"The pop trio’s new single casts the iconic 24-hour breakfast chain as a place of conversation and healing, not fights and thrown chairs.","promoDescription":"<p>The pop trio’s new single casts the iconic 24-hour breakfast chain as a place of conversation and healing, not fights and thrown chairs.</p>\n","socialDescription":"The pop trio’s new single casts the iconic 24-hour breakfast chain as a place of conversation and healing, not fights and thrown chairs.","authors":[{"id":"592604b57fd06e5349102f43","name":"Evan Minsker","title":"Associate News Director","url":"/staff/evan-minsker/","slug":"staff/evan-minsker"}] {
Horrifying, right? In this tutorial, we’ll be looking at three things:
- Exploring how LangChain talks to APIs
- Using ChatGPT to document previously-undocumented APIs
- Comparing two techniques for enabling LangChain to use undocumented APIs (explicit vs implicit documentation)
By the end we’ll have developed a tool that can use the unofficial Pitchfork API to answer natural-language questions about albums they’ve reviewed.
Want to learn how to discover undocumented APIs? Check out Inspect Element by Leon Yin
Setup
Give me one moment to set things up! First we’ll pull in all of our API keys using python-dotenv, then we’ll use a thousand and one imports to bring in LangChain and friends.
# LangChain imports
from langchain.agents import initialize_agent
from langchain.chat_models import ChatOpenAI
from langchain.chains import APIChain
from langchain.prompts.prompt import PromptTemplate
from langchain.callbacks import get_openai_callback
from langchain import LLMChain
# Normal human imports
import requests
from bs4 import BeautifulSoup
import json
Even though GPT-4 is out, we’ll be using GPT-3.5-turbo for this one. At the moment it’s infinitely cheaper, and every penny matters!
= ChatOpenAI(model='gpt-3.5-turbo', temperature=0) llm
LangChain and documented APIs
LangChain is a “framework for developing applications powered by language models,” which really undersells the fact that it’s a universe-altering suite of tools for putting GPT (and related tools) to work. In this tutorial we’re focusing on how it interacts with APIs.
In the LangChain documentation for working with APIs there’s a super-simple example of using APIChain
to get an answer from a free weather API. You create your API-aware “chain” from two things: your large language model (in this case, GPT-3.5-turbo) and the documentation to the API.
= APIChain.from_llm_and_api_docs(llm, open_meteo_docs.OPEN_METEO_DOCS, verbose=True) chain
Once your chain is created, you ask it your question and the chain goes to work sending information back and forth with ChatGPT:
- The chain sends the API docs to ChatGPT, asking what API URL you should visit
- ChatGPT reads the docs, sends back a URL
- LangChain obtains the data from the URL
- The data is sent back to ChatGPT, which uses it to answer your question
Let’s see it in action:
'What is the weather like right now in Munich, Germany in degrees Farenheit?') chain.run(
> Entering new APIChain chain...
https://api.open-meteo.com/v1/forecast?latitude=48.137154&longitude=11.576124¤t_weather=true&temperature_unit=fahrenheit
{"latitude":48.14,"longitude":11.58,"generationtime_ms":0.15795230865478516,"utc_offset_seconds":0,"timezone":"GMT","timezone_abbreviation":"GMT","elevation":526.0,"current_weather":{"temperature":40.2,"windspeed":8.8,"winddirection":261.0,"weathercode":2,"is_day":0,"time":"2023-04-08T00:00"}}
> Finished chain.
'The current weather in Munich, Germany is 40.2 degrees Fahrenheit.'
You can see each step of the process: the URL, the data, the answer! This transparency is thanks to us using verbose=True
when we first created the chain.
The “secret sauce” of the APIChain
formula is the OpenMeteo API documentation. The documentation is detailed enough to provide ChatGPT with everything it needs to know in creating the API URL: from what I can see, it needs a latitude and longitude, a current_weather=true
, and a demand to be in degrees fahrenheit.
How detailed is the OpenMeteo documentation that ChatGPT is using? It’s easy enough to examine the documentation for this particular API, as it’s actually provided as part of LangChain:
print(open_meteo_docs.OPEN_METEO_DOCS)
BASE URL: https://api.open-meteo.com/
API Documentation
The API endpoint /v1/forecast accepts a geographical coordinate, a list of weather variables and responds with a JSON hourly weather forecast for 7 days. Time always starts at 0:00 today and contains 168 hours. All URL parameters are listed below:
Parameter Format Required Default Description
latitude, longitude Floating point Yes Geographical WGS84 coordinate of the location
hourly String array No A list of weather variables which should be returned. Values can be comma separated, or multiple &hourly= parameter in the URL can be used.
daily String array No A list of daily weather variable aggregations which should be returned. Values can be comma separated, or multiple &daily= parameter in the URL can be used. If daily weather variables are specified, parameter timezone is required.
current_weather Bool No false Include current weather conditions in the JSON output.
temperature_unit String No celsius If fahrenheit is set, all temperature values are converted to Fahrenheit.
windspeed_unit String No kmh Other wind speed speed units: ms, mph and kn
precipitation_unit String No mm Other precipitation amount units: inch
timeformat String No iso8601 If format unixtime is selected, all time values are returned in UNIX epoch time in seconds. Please note that all timestamp are in GMT+0! For daily values with unix timestamps, please apply utc_offset_seconds again to get the correct date.
timezone String No GMT If timezone is set, all timestamps are returned as local-time and data is returned starting at 00:00 local-time. Any time zone name from the time zone database is supported. If auto is set as a time zone, the coordinates will be automatically resolved to the local time zone.
past_days Integer (0-2) No 0 If past_days is set, yesterday or the day before yesterday data are also returned.
start_date
end_date String (yyyy-mm-dd) No The time interval to get weather data. A day must be specified as an ISO8601 date (e.g. 2022-06-30).
models String array No auto Manually select one or more weather models. Per default, the best suitable weather models will be combined.
Hourly Parameter Definition
The parameter &hourly= accepts the following values. Most weather variables are given as an instantaneous value for the indicated hour. Some variables like precipitation are calculated from the preceding hour as an average or sum.
Variable Valid time Unit Description
temperature_2m Instant °C (°F) Air temperature at 2 meters above ground
snowfall Preceding hour sum cm (inch) Snowfall amount of the preceding hour in centimeters. For the water equivalent in millimeter, divide by 7. E.g. 7 cm snow = 10 mm precipitation water equivalent
rain Preceding hour sum mm (inch) Rain from large scale weather systems of the preceding hour in millimeter
showers Preceding hour sum mm (inch) Showers from convective precipitation in millimeters from the preceding hour
weathercode Instant WMO code Weather condition as a numeric code. Follow WMO weather interpretation codes. See table below for details.
snow_depth Instant meters Snow depth on the ground
freezinglevel_height Instant meters Altitude above sea level of the 0°C level
visibility Instant meters Viewing distance in meters. Influenced by low clouds, humidity and aerosols. Maximum visibility is approximately 24 km.
Look at all those details and options! It might be overwhelming to us, but ChatGPT has no problem figuring it out.
If we want to use the Pitchfork API to ask questions in a similar fashion, it seems like we might need some documentation for it.
There’s only one problem: it’s an unofficial, undocumented API.
Automatic documentation generation
Luckily for us, APIs aren’t (often) all that complicated. Let’s look at the URL from Pitchfork API’s:
https://pitchfork.com/api/v2/search/?genre=experimental&genre=global&genre=jazz&genre=metal&genre=pop&genre=rap&genre=rock&types=reviews&sort=publishdate%20desc%2Cposition%20asc&size=5&start=0&rating_from=0.0
It’s long and full of wild characters, but with some patience it’s possible to break it down! After the /search/
part, we see a lot of pieces we can make assumptions about.
size=5
probably has to do with how many results are returned. In this case, our search gives us 5 results.genre=jazz&genre=metal
is probably all of the genres of albums to returnrating_from=0.0
potentially sets a lower bar for the rating of the album reviews. Pitchfork ranks them 0-10, so this includes all albums.- There’s also
types
,sort
,start
… We can guess about those, too!
Do we know that our guesses are correct? Absolutely not. But with a little time and some manual labor, we might be able to test our hypotheses about what each parameter actually means!
…but we have neither time or manual labor: we’re playing loose and fast with the truth, we’re moving fast and breaking things, we’re fucking around and hopefully only finding out beautiful, blissful things. Instead of time and labor, we have AI.
Instead of us making assumptions about what everything in the API means, let’s have ChatGPT do it instead. Neither of us actually know the truth, but it’s seen enough APIs that its guesses might even be better than ours. And instead of giving up halfway through like we did above, we can have it go the extra mile to write beautiful docs just like the OpenMeteo ones.
Below we build a prompt that takes an API url and asks ChatGPT to write us “detailed documentation” for the provided URL.
from langchain.prompts import PromptTemplate
= PromptTemplate(
prompt =["api_url"],
input_variables="""Act as a technical writer. Write detailed documentation for the API that exists at {api_url}. Only detail the request, do not describe the response. Do not include any parameters not in the sample endpoint."""
template
)
= LLMChain(
chain =llm,
llm=True,
verbose=prompt
prompt )
Notice my extra-stern warning to not provide an example response. While we’re fine with ChatGPT guessing how to use the API based on the URL, any assumptions about the data that gets returned would be 100% invented and therefore a waste of tokens and money.
Now we’ll take our Pitchfork API url and feed it into the chain.
= "https://pitchfork.com/api/v2/search/?genre=experimental&genre=global&genre=jazz&genre=metal&genre=pop&genre=rap&genre=rock&types=reviews&sort=publishdate%20desc%2Cposition%20asc&size=5&start=0&rating_from=0.0"
url
= chain.run(url)
response print(response)
> Entering new LLMChain chain...
Prompt after formatting:
Act as a technical writer. Write detailed documentation for the API that exists at https://pitchfork.com/api/v2/search/?genre=experimental&genre=global&genre=jazz&genre=metal&genre=pop&genre=rap&genre=rock&types=reviews&sort=publishdate%20desc%2Cposition%20asc&size=5&start=0&rating_from=0.0. Only detail the request, do not describe the response. Do not include any parameters not in the sample endpoint.
> Finished chain.
API Documentation
Pitchfork Search API
This API is used to search for reviews on Pitchfork based on specified genres, types, sorting, and size.
Sample Endpoint:
https://pitchfork.com/api/v2/search/?genre=experimental&genre=global&genre=jazz&genre=metal&genre=pop&genre=rap&genre=rock&types=reviews&sort=publishdate%20desc%2Cposition%20asc&size=5&start=0&rating_from=0.0
Request Method: GET
Request Parameters:
• genre: This parameter is used to specify the genre of the review. You can specify multiple genres by adding the parameter multiple times. Possible values are experimental, global, jazz, metal, pop, rap, and rock.
• types: This parameter is used to specify the type of the content you want to retrieve. The possible values are reviews, features, tracks, labels, and artists.
• sort: This parameter is used to specify the sorting order of the content you want to retrieve. The possible values are publishdate desc, publishdate asc, position desc, and position asc. You can specify multiple sorting orders by separating them with commas.
• size: This parameter is used to specify the number of results that you want to retrieve. The possible values are integers between 1 and 100.
• start: This parameter is used to specify the starting position for the search, to retrieve the next "page" of results. The possible values are integers greater than or equal to 0.
• rating_from: This parameter is used to specify the minimum rating for the content you want to retrieve. The possible values are decimal numbers between 0.0 and 10.0.
Response Format:
The response format for this API is in JSON.
Authentication:
No authentication is required to use this API.
Error Codes:
The following response codes may be returned by the API:
• 200 OK: Successful request.
• 400 Bad Request: Invalid parameters.
• 401 Unauthorized: Authentication required.
• 403 Forbidden: Access denied.
• 404 Not Found: Resource not found.
• 500 Internal Server Error: Server error.
Limitations:
• This API is subject to rate limiting.
• The maximum size of a search result is 100.
• Some parameters may not be compatible with each other - for example, using both sorting and start parameters may lead to unexpected behavior.
Examples:
To retrieve the 5 most recent reviews for the experimental, jazz, and pop genres, the request URL would be:
https://pitchfork.com/api/v2/search/?genre=experimental&genre=jazz&genre=pop&types=reviews&sort=publishdate%20desc%2Cposition%20asc&size=5&start=0&rating_from=0.0
To retrieve the top 5 highest-rated metal reviews, the request URL would be:
https://pitchfork.com/api/v2/search/?genre=metal&types=reviews&sort=position%20asc&size=5&start=0&rating_from=8.0
To retrieve the next 5 results from a previous query, the request URL would be:
https://pitchfork.com/api/v2/search/?genre=experimental&genre=jazz&genre=pop&types=reviews&sort=publishdate%20desc%2Cposition%20asc&size=5&start=5&rating_from=0.0
That documentation is far nicer than anything I’d personally write!
Is it all correct, though? Absolutely not!
While the “limitations” section was invented out of whole cloth, I’d like to point out the section near the end about metal albums:
To retrieve the top 5 highest-rated metal reviews, the request URL would be:
https://pitchfork.com/api/v2/search/?genre=metal&types=reviews&sort=position%20asc&size=5&start=0&rating_from=8.0
This line includes at least one error: what if Pitchfork hates metal, and none of the albums scored about a 4.5? The URL sets a rating floor of 8.0, so there’d be no results! It also makes assumptions about what the position
parameter means that I don’t necessarily trust, but that’s what happens when you force GPT into a guessing-game corner.
Let’s think positively, though: these docs are a great starting point, and if we find an issue we can always make manual edits. While we’re too lazy to make those edits right now, at least we know it’s possible.
Now let’s talk about how we want to use this auto-generated documentation.
Every time I re-run the documentation-generating code the results are very, very different! If you’re following along at home definitely give it a few runs and see what changes each time.
Explicit documentation
I’m going to call the type of text above explicit documentation. It’s “real” documentation, words phrased and formatted in order to communicate specific details about and examples of the API.
This explicit documentation is just like the OpenMeteo documentation from the LangChain documentation, and we’re going to use it in the exact same way:
- We’ll make a new
APIChain
, giving it the language model and the API docs. - After the chain is made, we’ll ask it our question
The APIChain should then use the documentation to format a URL, then use the data from the URL to answer the question. Let’s see how it works!
# Saving the response from above as `explicit_docs` since we'll use it again later
= response
explicit_docs = APIChain.from_llm_and_api_docs(llm, explicit_docs, verbose=True) explicit_chain
NameError: name 'response' is not defined
= explicit_chain.run("What was the first rap album reviewed by pitchfork?")
response print(response)
> Entering new APIChain chain...
https://pitchfork.com/api/v2/search/?genre=rap&types=reviews&sort=publishdate%20asc&size=1&rating_from=0.0
{"count":4253,"previous":null,"next":null,"results":{"category":null,"list":[{"tombstone":{"bnm":false,"bnr":false,"albums":[{"id":"5929c3a7eb335119a49ed773","album":{"artists":[{"id":"592994259d034d5c69bf1739","display_name":"Roots Manuva","url":"/artists/2672-roots-manuva/","genres":[{"display_name":"Electronic","slug":"electronic"},{"display_name":"Jazz","slug":"jazz"},{"display_name":"Rap","slug":"rap"}],"slug":"592994259d034d5c69bf1739","photos":{"tout":{"width":300,"height":300,"credit":"","caption":"","altText":"Image may contain: Face, Human, Person, Roots Manuva, Head, Photo, Portrait, and Photography","modelName":"photo","title":"Roots Manuva artist image","sizes":{"sm":"https://media.pitchfork.com/photos/59299426c0084474cd0bec29/1:1/w_150/3d81e0d6.jpg","m":"https://media.pitchfork.com/photos/59299426c0084474cd0bec29/1:1/w_300/3d81e0d6.jpg"}},"lede":false,"social":false}}],"display_name":"Brand New Secondhand","labels":[{"id":"592608737fd06e5349102fdb","name":"Ninja Tune","display_name":"Ninja Tune"},{"id":"59260899c31f3f3472b1d6cc","name":"Big Dada","display_name":"Big Dada"}],"release_year":1999,"photos":{"tout":{"width":150,"height":150,"credit":"","caption":"","altText":"Image may contain: Display, Screen, Electronics, Monitor, Television, and TV","title":"Brand New Secondhand cover art","sizes":{"list":"https://media.pitchfork.com/photos/5929c3a7c0084474cd0c3506/1:1/w_160/c23e052b.gif","standard":"https://media.pitchfork.com/photos/5929c3a7c0084474cd0c3506/1:1/w_600/c23e052b.gif","homepageSmall":"https://media.pitchfork.com/photos/5929c3a7c0084474cd0c3506/1:1/w_55/c23e052b.gif","homepageLarge":"https://media.pitchfork.com/photos/5929c3a7c0084474cd0c3506/1:1/w_320/c23e052b.gif"}},"lede":false,"social":false}},"rating":{"display_rating":"9.5","rating":"9.5","bnm":false,"bnr":false},"labels_and_years":[{"labels":[{"id":"592608737fd06e5349102fdb","name":"Ninja Tune","display_name":"Ninja Tune"},{"id":"59260899c31f3f3472b1d6cc","name":"Big Dada","display_name":"Big Dada"}],"year":1999}]}]},"artists":[{"id":"592994259d034d5c69bf1739","display_name":"Roots Manuva","url":"/artists/2672-roots-manuva/","genres":[{"display_name":"Electronic","slug":"electronic"},{"display_name":"Jazz","slug":"jazz"},{"display_name":"Rap","slug":"rap"}],"slug":"592994259d034d5c69bf1739","photos":{"tout":{"width":300,"height":300,"credit":"","caption":"","altText":"Image may contain: Face, Human, Person, Roots Manuva, Head, Photo, Portrait, and Photography","modelName":"photo","title":"Roots Manuva artist image","sizes":{"sm":"https://media.pitchfork.com/photos/59299426c0084474cd0bec29/1:1/w_150/3d81e0d6.jpg","m":"https://media.pitchfork.com/photos/59299426c0084474cd0bec29/1:1/w_300/3d81e0d6.jpg"}},"lede":false,"social":false}}],"genres":[{"display_name":"Electronic","slug":"electronic"},{"display_name":"Jazz","slug":"jazz"},{"display_name":"Rap","slug":"rap"}],"channel":"","subChannel":"","position":6,"id":"5929e2b6c0084474cd0c4dc2","url":"/reviews/albums/5099-brand-new-secondhand/","contentType":"albumreview","title":"<em>Brand New Secondhand</em>","seoTitle":"Brand New Secondhand","socialTitle":"Roots Manuva: Brand New Secondhand","promoTitle":"Brand New Secondhand","authors":[{"id":"592604af17cea934e4daf5f4","name":"Paul Cooper","title":"Contributor","url":"/staff/paul-cooper/","slug":"staff/paul-cooper"}],"pubDate":"1999-03-23T06:00:06.000Z","timestamp":922168806000,"modifiedAt":"2022-03-31T08:47:24.426Z","dek":"<p>For politcially unaware, socially unconscious, ethically moribund pop culture vultures, there's no bigger disappointment than UK hip-hop. In the ...</p>\n","seoDescription":"For politcially unaware, socially unconscious, ethically moribund pop culture vultures, there's no bigger disappointment than UK hip-hop. In the ...","promoDescription":"<p>For politcially unaware, socially unconscious, ethically moribund pop culture vultures, there's no bigger disappointment than UK hip-hop. In the ...</p>\n","socialDescription":"For politcially unaware, socially unconscious, ethically moribund pop culture vultures, there's no bigger disappointment than UK hip-hop. In the ...","privateTags":["_dj_id:5099","_original_author_id:95"],"tags":[]}]}}
> Finished chain.
The first rap album reviewed by Pitchfork was "Brand New Secondhand" by Roots Manuva, with a rating of 9.5. The review was published on March 23, 1999.
The URL it chose to visit was
https://pitchfork.com/api/v2/search/?genre=rap&types=reviews&sort=publishdate%20asc&size=1&rating_from=0.0
It adjusted the genre list, the publish date, and decided it only needed a single result! That seems pretty remarkable to me, and the result of an album from 1999 also feels right.
If you’re curious, you can read the actual review here.
For politcially unaware, socially unconscious, ethically moribund pop culture vultures, there’s no bigger disappointment than UK hip-hop.
Ouch. But it goes on to claim Roots Manuva as the redeemer of that tiny island!
Implicit documentation
While the explicit documentation above is pretty fantastic, it also might be a waste of time. Think about it: if ChatGPT generates a new URL by reading the documentation it created by reading the single URL…
flowchart LR A[URL] --> B[ChatGPT] B --> C[Documentation] C --> D[ChatGPT] D --> E[New URL]
…why can’t we just cut out the middleman? Can’t we just say, “here’s a sample URL, figure out the new one?”
flowchart LR A[URL] -.- B[ChatGPT] B -.- C[Documentation] C -.- D[ChatGPT] D -.- E[New URL] A[URL] ==> F[ChatGPT] F ==> E
The documentation isn’t providing anything ChatGPT doesn’t know already – it’s just wordier expansion of the original URL – so this seems like a reasonable cheat, right? Let’s try it!
We’e going to call this implicit documentation: briefly describing the existence of the API and giving the sample endpoint. Instead of detailed documentation, it’s just that one URL.
= """
implicit_docs Pitchfork has an API with a sample endpoint at https://pitchfork.com/api/v2/search/?genre=experimental&genre=global&genre=jazz&genre=metal&genre=pop&genre=rap&genre=rock&types=reviews&sort=publishdate%20desc%2Cposition%20asc&size=5&start=0&rating_from=0.0
"""
= APIChain.from_llm_and_api_docs(llm, implicit_docs, verbose=True) implicit_chain
What are you expecting? Will it work?? Let’s give it a shot.
"What was the first rap album reviewed by pitchfork?") implicit_chain.run(
> Entering new APIChain chain...
https://pitchfork.com/api/v2/search/?genre=rap&types=reviews&sort=publishdate%20asc&size=1&start=0&rating_from=0.0
{"count":4253,"previous":null,"next":null,"results":{"category":null,"list":[{"tombstone":{"bnm":false,"bnr":false,"albums":[{"id":"5929c3a7eb335119a49ed773","album":{"artists":[{"id":"592994259d034d5c69bf1739","display_name":"Roots Manuva","url":"/artists/2672-roots-manuva/","genres":[{"display_name":"Electronic","slug":"electronic"},{"display_name":"Jazz","slug":"jazz"},{"display_name":"Rap","slug":"rap"}],"slug":"592994259d034d5c69bf1739","photos":{"tout":{"width":300,"height":300,"credit":"","caption":"","altText":"Image may contain: Face, Human, Person, Roots Manuva, Head, Photo, Portrait, and Photography","modelName":"photo","title":"Roots Manuva artist image","sizes":{"sm":"https://media.pitchfork.com/photos/59299426c0084474cd0bec29/1:1/w_150/3d81e0d6.jpg","m":"https://media.pitchfork.com/photos/59299426c0084474cd0bec29/1:1/w_300/3d81e0d6.jpg"}},"lede":false,"social":false}}],"display_name":"Brand New Secondhand","labels":[{"id":"592608737fd06e5349102fdb","name":"Ninja Tune","display_name":"Ninja Tune"},{"id":"59260899c31f3f3472b1d6cc","name":"Big Dada","display_name":"Big Dada"}],"release_year":1999,"photos":{"tout":{"width":150,"height":150,"credit":"","caption":"","altText":"Image may contain: Display, Screen, Electronics, Monitor, Television, and TV","title":"Brand New Secondhand cover art","sizes":{"list":"https://media.pitchfork.com/photos/5929c3a7c0084474cd0c3506/1:1/w_160/c23e052b.gif","standard":"https://media.pitchfork.com/photos/5929c3a7c0084474cd0c3506/1:1/w_600/c23e052b.gif","homepageSmall":"https://media.pitchfork.com/photos/5929c3a7c0084474cd0c3506/1:1/w_55/c23e052b.gif","homepageLarge":"https://media.pitchfork.com/photos/5929c3a7c0084474cd0c3506/1:1/w_320/c23e052b.gif"}},"lede":false,"social":false}},"rating":{"display_rating":"9.5","rating":"9.5","bnm":false,"bnr":false},"labels_and_years":[{"labels":[{"id":"592608737fd06e5349102fdb","name":"Ninja Tune","display_name":"Ninja Tune"},{"id":"59260899c31f3f3472b1d6cc","name":"Big Dada","display_name":"Big Dada"}],"year":1999}]}]},"artists":[{"id":"592994259d034d5c69bf1739","display_name":"Roots Manuva","url":"/artists/2672-roots-manuva/","genres":[{"display_name":"Electronic","slug":"electronic"},{"display_name":"Jazz","slug":"jazz"},{"display_name":"Rap","slug":"rap"}],"slug":"592994259d034d5c69bf1739","photos":{"tout":{"width":300,"height":300,"credit":"","caption":"","altText":"Image may contain: Face, Human, Person, Roots Manuva, Head, Photo, Portrait, and Photography","modelName":"photo","title":"Roots Manuva artist image","sizes":{"sm":"https://media.pitchfork.com/photos/59299426c0084474cd0bec29/1:1/w_150/3d81e0d6.jpg","m":"https://media.pitchfork.com/photos/59299426c0084474cd0bec29/1:1/w_300/3d81e0d6.jpg"}},"lede":false,"social":false}}],"genres":[{"display_name":"Electronic","slug":"electronic"},{"display_name":"Jazz","slug":"jazz"},{"display_name":"Rap","slug":"rap"}],"channel":"","subChannel":"","position":6,"id":"5929e2b6c0084474cd0c4dc2","url":"/reviews/albums/5099-brand-new-secondhand/","contentType":"albumreview","title":"<em>Brand New Secondhand</em>","seoTitle":"Brand New Secondhand","socialTitle":"Roots Manuva: Brand New Secondhand","promoTitle":"Brand New Secondhand","authors":[{"id":"592604af17cea934e4daf5f4","name":"Paul Cooper","title":"Contributor","url":"/staff/paul-cooper/","slug":"staff/paul-cooper"}],"pubDate":"1999-03-23T06:00:06.000Z","timestamp":922168806000,"modifiedAt":"2022-03-31T08:47:24.426Z","dek":"<p>For politcially unaware, socially unconscious, ethically moribund pop culture vultures, there's no bigger disappointment than UK hip-hop. In the ...</p>\n","seoDescription":"For politcially unaware, socially unconscious, ethically moribund pop culture vultures, there's no bigger disappointment than UK hip-hop. In the ...","promoDescription":"<p>For politcially unaware, socially unconscious, ethically moribund pop culture vultures, there's no bigger disappointment than UK hip-hop. In the ...</p>\n","socialDescription":"For politcially unaware, socially unconscious, ethically moribund pop culture vultures, there's no bigger disappointment than UK hip-hop. In the ...","privateTags":["_dj_id:5099","_original_author_id:95"],"tags":[]}]}}
> Finished chain.
'The first rap album reviewed by Pitchfork was "Brand New Secondhand" by Roots Manuva, published on March 23, 1999, with a rating of 9.5. The API call used to retrieve this information was: https://pitchfork.com/api/v2/search/?genre=rap&types=reviews&sort=publishdate%20asc&size=1&start=0&rating_from=0.0.'
That’s an A+ perfect response, with a heck of a lot less work. I don’t even have anything else to say about it: it works, it’s shorter, it’s a lot less work.
Summary and differences
Let’s look at two major ways the explicit and implicit approaches might differ: success rate and cost.
Success rate
Believe it or not, the implicit approach has a much higher success rate!
While writing this post, I’ve had to tweak the prompt that generates the explicit documentation again and again and again. ChatGPT seems to go out of its way to lie about the API, and not even in subtle ways. It…
- Loves to introduce new, non-existent features
- Turns the
genre=rap&genre=folk
parameter into various flavors ofgenre=rap,folk
, and just generally sacrifices the reality of the API for - Invents lower and upper bounds for page sizes and ratings
While these might be good architectural changes or useful new features, they absolutely aren’t implied by the original URL! Without adding a push to be conservative to the prompt, we end up with docs that are completely misleading.
When you take the misleading documentation and feed it to the APIChain
prompt, it frequently screws up the request. Overall I’ve found the explicitly-documented example breaks about a third of the time!
On the other hand, explicit documentation does allow you to clean up and customize the docs. If you find out the maximum and minimum page size, you’re free to add it! If other genres get released or removed, you’re more than able to edit the list.
But if you’re lazy, and just looking for a shortcut? Implicit docs always peform better.
Costs
While we’re all very excited about using the various OpenAI APIs, they do have a financial cost. GPT-3.5-turbo
is remarkably inexpensive compared to its peers, but we aren’t here to waste money! If we can keep the prompt smaller our queries cost less, and saving money is the second-quickest route to happiness.
Let’s use LangChain’s get_openai_callback
to compare the token count and cost of our explicit vs implicit requests. Note that we’re using verbose=False
here to reduce clutter.
= APIChain.from_llm_and_api_docs(llm, explicit_docs, verbose=False)
explicit_chain
with get_openai_callback() as cb:
= explicit_chain.run("What was the first rap album reviewed by pitchfork?")
response print(f"Response: {response}")
print(f"Total Tokens: {cb.total_tokens}")
print(f"Prompt Tokens: {cb.prompt_tokens}")
print(f"Completion Tokens: {cb.completion_tokens}")
print(f"Total Cost (USD): ${cb.total_cost}")
Response: The first rap album reviewed by Pitchfork was Roots Manuva's "Brand New Secondhand" released in 1999, with a review published on March 23, 1999, with a rating of 9.5.
Total Tokens: 3058
Prompt Tokens: 2970
Completion Tokens: 88
Total Cost (USD): $0.006116
= APIChain.from_llm_and_api_docs(llm, implicit_docs, verbose=False)
implicit_chain
with get_openai_callback() as cb:
= implicit_chain.run("What was the first rap album reviewed by pitchfork?")
response print(f"Response: {response}")
print(f"Total Tokens: {cb.total_tokens}")
print(f"Prompt Tokens: {cb.prompt_tokens}")
print(f"Completion Tokens: {cb.completion_tokens}")
print(f"Total Cost (USD): ${cb.total_cost}")
Response: The first rap album reviewed by Pitchfork was "Brand New Secondhand" by Roots Manuva, with a rating of 9.5. The review was written by Paul Cooper and was published on March 23, 1999.
Total Tokens: 1811
Prompt Tokens: 1722
Completion Tokens: 89
Total Cost (USD): $0.0036220000000000002
Along with being consistent less correct, the explicit chain costs almost twice as much! Just above 0.6 cents for explicit compared to 0.36 for implicit. Depending on how wordy GPT decides to be, some of my tests have seen it up to three times as much.
The wildest part about this large difference is that both queries almost always make the same API request, which I assume would take up the bulk of the tokens (secret fact: Pitchfork’s API is so wordy I changed the example to size=5
so that it would fit in the GPT-3.5-turbo context window).
The takeaway
Unless you enjoy editing or spending money, it looks like less is more.