Introduction to peewee

Peewee is a “simple, expressive” ORM for Python. Instead of running SQL queries in your code, it allows you to connect to a database and run queries on it in a way that seems a little more Python-y.

Prepping our data

I took two datasets from the NYC data portal, one is a directory of high schools and one is their performance data.

  1. High School Directory
  2. High School SAT Results

I downloaded them both as CSV files, then cleaned them up a little. Some columns are almost numbers-only, but not quite,

  1. For the directory, I edited grade_span_min, grade_span_max and expgrade_span_min: I changed K to 0 and PK to -1. That way we’ll be able to do math with those numbers later.
  2. For the SAT scores, for everything that has an s in it I just blanked out the cell (filter proved very useful).
  3. I also renamed the columns to be all lowercase, with underscores. So now we have: dbn, school_name, number_of_test_takers, critical_reading_mean, mathematics_mean, writing_mean

Now we’ll use the wonderful CSVKit to convert it into a SQLite database with a schools table and a sat_scores table.

csvsql --db sqlite:///schools.db --insert --table schools DOE_High_School_Directory_2016.csv 
csvsql --db sqlite:///schools.db --insert --table sat_scores SAT__College_Board__2010_School_Level_Results.csv 

NOTE: If__ you_received an error about the data not being in UTF-8, you might want to give Google Spreadsheets a try instead of Excel.

No error messages? Great! Let’s take a peek at our file. Run sqlite3 schools.db to open up the SQLite console with schools.db.

First we’ll use .tables to see what our tables are

sqlite> .tables
sat_scores  schools 

And then .schema schools and .schema sat_scores to see what’s hiding in the tables

sqlite> .schema schools
CREATE TABLE schools (
	dbn VARCHAR(6) NOT NULL, 
	school_name VARCHAR(79) NOT NULL, 
	boro VARCHAR(13) NOT NULL, 
	building_code VARCHAR(4) NOT NULL, 
	shared_space BOOLEAN NOT NULL, 
	phone_number VARCHAR(12) NOT NULL, 
	grade_span_min INTEGER NOT NULL, 
	grade_span_max INTEGER NOT NULL, 
  ....
	priority09 VARCHAR(32), 
	priority10 VARCHAR(32), 
	CHECK (shared_space IN (0, 1))
);  

Looking good! We’ll want to remember the .schema command when we have to tell peewee what’s inside of the database, but for now we can rest easy that we have a working SQLite database.

Installing peewee

Peewee is your standard Python library, you can install it with pip

pip install peewee

Connecting with peewee

The first thing we need to do is define our models. Models are representation of a databases table. We have two database tables - schools and sat_scores, so we’ll need two models, one to represent each of them.

Take the following (halfway-complete) code and save it as models.py.

# Import from peewee
from peewee import *

# Connect to the SQLite database
db = SqliteDatabase('schools.db')

# Define what a 'School' is
class School(Model):
  # These are all the fields it has
  # match up CharField/IntegerField/etc with correct type
  dbn = CharField(primary_key=True) # primary key = unique id
  school_name = CharField()
  boro = CharField()
  grade_span_min = IntegerField()
  grade_span_max = IntegerField()
  total_students = IntegerField()

  class Meta:
    # data is coming from schools.db
    database = db
    # and it's in the table called 'schools'
    db_table = 'schools'

# Repeat with the SAT scores
class Score(Model):
  dbn = CharField(primary_key=True)
  school_name = CharField()
  number_of_test_takers = CharField()
  critical_reading_mean = IntegerField()
  mathematics_mean = IntegerField()
  writing_mean = IntegerField()
  
  class Meta:
    database = db
    db_table = 'sat_scores'

Let’s break it down piece by piece, shall we?

from peewee import *

Grab the peewee library and import all of the parts of it

db = SqliteDatabase('schools.db')

Connect to the SQLite database called schools.db. This part is flexible - if you had a MySQL database you’d use MySQLDatabase instead of a SqliteDatabase. You could also connect to a lot of different SQLite databases, if you so desired! For example:

# If you wanted to connect to many databases, no problem
schools_db = SqliteDatabase('schools.db')
crime_db = SqliteDatabase('crime.db')
socks_db = SqliteDatabase('socks_i_own.db')

Now we need to define the model.

class School(Model):
  # These are all the fields it has
  # match up CharField/IntegerField/etc with correct type
  dbn = CharField(primary_key=True) # primary key = unique id
  school_name = CharField()
  boro = CharField()
  grade_span_min = IntegerField()
  grade_span_max = IntegerField()
  total_students = IntegerField()

  class Meta:
    # data is coming from schools.db
    database = db
    # and it's in the table called 'schools'
    db_table = 'schools'

You need to explain to peewee what’s in the table and what each column is - school_name = CharField() means that school_name is a string, while grade_span_min is an IntegerField() because it is, of course, a number.

There are a whole lot of types, including BooleanField() and DateField() and more.

You then define a class called Meta inside of School to explain where exactly this model should be pulling its information from. We defined db earlier as attaching to schools.db, and we set the db_table since the information is sitting in the schools table. If your SQLite database only has one table, you can ignore db_table.

We then repeat the process for SAT scores.

TODO explain primary keys

Querying with peewee

That was a heck of a lot of set up, wasn’t it? Now we can actually reach out to query our database.

Note: Remember we saved the code above as models.rb! That means we can import it using Python so it can keep our code separated and clean.

Let’s write a script to connect to import our models, then do some fun stuff with the database.

from models import *

What is the DBN for Union Square Academy? You can use .get to query for one specific record in the database.

school = School.get(School.school_name == 'Union Square Academy for Health Sciences')
print school.dbn

Most schools probably have unique names, but you can also use .get on non-unique fields. Let’s say I wanted exactly one school in Brooklyn.

school = School.get(School.boro == 'Brooklyn')
print school.school_name

If you want to select multiple rows, though, you need to use .select(). If you’re particular about which ones, you’ll also want to use .where.

schools = School.select().where(School.boro == 'Brooklyn')

You can count them, or loop through them

print schools.count()
for school in schools:
  print school.school_name

Oh, was that too many to loop through? You can use .limit() to only take the first few results. Maybe we want the first five schools in Manhattan, ordered by the number of students?

schools = School.select().where(School.boro == 'Manhattan').limit(5).order_by(School.total_students.asc())
for school in schools:
  print school.school_name, school.total_students

And the biggest schools in the Bronx?

schools = School.select().where(School.boro == 'Bronx').limit(5).order_by(School.total_students.desc())
for school in schools:
  print school.school_name, school.total_students

You can use all your favorite SQL query pieces - WHERE, ORDER, LIMIT, even JOIN and all of that.

NOTE: I really recommend the peewee documentation, it’s fantastic.

Now that we’re good on this, let’s put peewee on the back burner for a hot second while we learn how to build tiny web apps with Flask

Want to hear when I release new things?
My infrequent and sporadic newsletter can help with that.