Peewee is a “simple, expressive” ORM for Python. Instead of running SQL queries in your code, it allows you to connect to a database and run queries on it in a way that seems a little more Python-y.
I took two datasets from the NYC data portal, one is a directory of high schools and one is their performance data.
I downloaded them both as CSV files, then cleaned them up a little. Some columns are almost numbers-only, but not quite,
expgrade_span_min: I changed
Kto 0 and
PKto -1. That way we’ll be able to do math with those numbers later.
sin it I just blanked out the cell (filter proved very useful).
Now we’ll use the wonderful CSVKit to convert it into a SQLite database with a
schools table and a
NOTE: If__ you_received an error about the data not being in UTF-8, you might want to give Google Spreadsheets a try instead of Excel.
No error messages? Great! Let’s take a peek at our file. Run
sqlite3 schools.db to open up the SQLite console with
First we’ll use
.tables to see what our tables are
.schema schools and
.schema sat_scores to see what’s hiding in the tables
Looking good! We’ll want to remember the
.schema command when we have to tell peewee what’s inside of the database, but for now we can rest easy that we have a working SQLite database.
Peewee is your standard Python library, you can install it with
The first thing we need to do is define our models. Models are representation of a databases table. We have two database tables -
sat_scores, so we’ll need two models, one to represent each of them.
Take the following (halfway-complete) code and save it as
Let’s break it down piece by piece, shall we?
Grab the peewee library and import all of the parts of it
Connect to the SQLite database called
schools.db. This part is flexible - if you had a MySQL database you’d use
MySQLDatabase instead of a
SqliteDatabase. You could also connect to a lot of different SQLite databases, if you so desired! For example:
Now we need to define the model.
You need to explain to peewee what’s in the table and what each column is -
school_name = CharField() means that
school_name is a string, while
grade_span_min is an
IntegerField() because it is, of course, a number.
There are a whole lot of types, including
DateField() and more.
You then define a class called
Meta inside of
School to explain where exactly this model should be pulling its information from. We defined
db earlier as attaching to
schools.db, and we set the
db_table since the information is sitting in the
schools table. If your SQLite database only has one table, you can ignore
We then repeat the process for SAT scores.
TODO explain primary keys
That was a heck of a lot of set up, wasn’t it? Now we can actually reach out to query our database.
Note: Remember we saved the code above as
models.rb! That means we can
importit using Python so it can keep our code separated and clean.
Let’s write a script to connect to import our models, then do some fun stuff with the database.
What is the DBN for Union Square Academy? You can use
.get to query for one specific record in the database.
Most schools probably have unique names, but you can also use
.get on non-unique fields. Let’s say I wanted exactly one school in Brooklyn.
If you want to select multiple rows, though, you need to use
.select(). If you’re particular about which ones, you’ll also want to use
You can count them, or loop through them
Oh, was that too many to loop through? You can use
.limit() to only take the first few results. Maybe we want the first five schools in Manhattan, ordered by the number of students?
And the biggest schools in the Bronx?
You can use all your favorite SQL query pieces - WHERE, ORDER, LIMIT, even JOIN and all of that.
NOTE: I really recommend the peewee documentation, it’s fantastic.
Now that we’re good on this, let’s put peewee on the back burner for a hot second while we learn how to build tiny web apps with Flask