This tutorial is out of date! Try my longer, updated Flask-SQLite webapp tutorial here
Peewee is a “simple, expressive” ORM for Python. Instead of running SQL queries in your code, it allows you to connect to a database and run queries on it in a way that seems a little more Python-y.
I took two datasets from the NYC data portal, one is a directory of high schools and one is their performance data.
I downloaded them both as CSV files, then cleaned them up a little. Some columns are almost numbers-only, but not quite,
grade_span_min
, grade_span_max
and expgrade_span_min
: I changed K
to 0 and PK
to -1. That way we’ll be able to do math with those numbers later.s
in it I just blanked out the cell (filter proved very useful).dbn
, school_name
, number_of_test_takers
, critical_reading_mean
, mathematics_mean
, writing_mean
Now we’ll use the wonderful CSVKit to convert it into a SQLite database with a schools
table and a sat_scores
table.
csvsql --db sqlite:///schools.db --insert --table schools DOE_High_School_Directory_2016.csv
csvsql --db sqlite:///schools.db --insert --table sat_scores SAT__College_Board__2010_School_Level_Results.csv
NOTE: If__ you_received an error about the data not being in UTF-8, you might want to give Google Spreadsheets a try instead of Excel.
No error messages? Great! Let’s take a peek at our file. Run sqlite3 schools.db
to open up the SQLite console with schools.db
.
First we’ll use .tables
to see what our tables are
sqlite> .tables
sat_scores schools
And then .schema schools
and .schema sat_scores
to see what’s hiding in the tables
sqlite> .schema schools
CREATE TABLE schools (
dbn VARCHAR(6) NOT NULL,
school_name VARCHAR(79) NOT NULL,
boro VARCHAR(13) NOT NULL,
building_code VARCHAR(4) NOT NULL,
shared_space BOOLEAN NOT NULL,
phone_number VARCHAR(12) NOT NULL,
grade_span_min INTEGER NOT NULL,
grade_span_max INTEGER NOT NULL,
....
priority09 VARCHAR(32),
priority10 VARCHAR(32),
CHECK (shared_space IN (0, 1))
);
Looking good! We’ll want to remember the .schema
command when we have to tell peewee what’s inside of the database, but for now we can rest easy that we have a working SQLite database.
Peewee is your standard Python library, you can install it with pip
pip install peewee
The first thing we need to do is define our models. Models are representation of a databases table. We have two database tables - schools
and sat_scores
, so we’ll need two models, one to represent each of them.
Take the following (halfway-complete) code and save it as models.py
.
# Import from peewee
from peewee import *
# Connect to the SQLite database
db = SqliteDatabase('schools.db')
# Define what a 'School' is
class School(Model):
# These are all the fields it has
# match up CharField/IntegerField/etc with correct type
dbn = CharField(primary_key=True) # primary key = unique id
school_name = CharField()
boro = CharField()
grade_span_min = IntegerField()
grade_span_max = IntegerField()
total_students = IntegerField()
class Meta:
# data is coming from schools.db
database = db
# and it's in the table called 'schools'
db_table = 'schools'
# Repeat with the SAT scores
class Score(Model):
dbn = CharField(primary_key=True)
school_name = CharField()
number_of_test_takers = CharField()
critical_reading_mean = IntegerField()
mathematics_mean = IntegerField()
writing_mean = IntegerField()
class Meta:
database = db
db_table = 'sat_scores'
Let’s break it down piece by piece, shall we?
from peewee import *
Grab the peewee library and import all of the parts of it
db = SqliteDatabase('schools.db')
Connect to the SQLite database called schools.db
. This part is flexible - if you had a MySQL database you’d use MySQLDatabase
instead of a SqliteDatabase
. You could also connect to a lot of different SQLite databases, if you so desired! For example:
# If you wanted to connect to many databases, no problem
schools_db = SqliteDatabase('schools.db')
crime_db = SqliteDatabase('crime.db')
socks_db = SqliteDatabase('socks_i_own.db')
Now we need to define the model.
class School(Model):
# These are all the fields it has
# match up CharField/IntegerField/etc with correct type
dbn = CharField(primary_key=True) # primary key = unique id
school_name = CharField()
boro = CharField()
grade_span_min = IntegerField()
grade_span_max = IntegerField()
total_students = IntegerField()
class Meta:
# data is coming from schools.db
database = db
# and it's in the table called 'schools'
db_table = 'schools'
You need to explain to peewee what’s in the table and what each column is - school_name = CharField()
means that school_name
is a string, while grade_span_min
is an IntegerField()
because it is, of course, a number.
There are a whole lot of types, including BooleanField()
and DateField()
and more.
You then define a class called Meta
inside of School
to explain where exactly this model should be pulling its information from. We defined db
earlier as attaching to schools.db
, and we set the db_table
since the information is sitting in the schools
table. If your SQLite database only has one table, you can ignore db_table
.
We then repeat the process for SAT scores.
TODO explain primary keys
That was a heck of a lot of set up, wasn’t it? Now we can actually reach out to query our database.
Note: Remember we saved the code above as
models.rb
! That means we canimport
it using Python so it can keep our code separated and clean.
Let’s write a script to connect to import our models, then do some fun stuff with the database.
from models import *
What is the DBN for Union Square Academy? You can use .get
to query for one specific record in the database.
school = School.get(School.school_name == 'Union Square Academy for Health Sciences')
print school.dbn
Most schools probably have unique names, but you can also use .get
on non-unique fields. Let’s say I wanted exactly one school in Brooklyn.
school = School.get(School.boro == 'Brooklyn')
print school.school_name
If you want to select multiple rows, though, you need to use .select()
. If you’re particular about which ones, you’ll also want to use .where
.
schools = School.select().where(School.boro == 'Brooklyn')
You can count them, or loop through them
print schools.count()
for school in schools:
print school.school_name
Oh, was that too many to loop through? You can use .limit()
to only take the first few results. Maybe we want the first five schools in Manhattan, ordered by the number of students?
schools = School.select().where(School.boro == 'Manhattan').limit(5).order_by(School.total_students.asc())
for school in schools:
print school.school_name, school.total_students
And the biggest schools in the Bronx?
schools = School.select().where(School.boro == 'Bronx').limit(5).order_by(School.total_students.desc())
for school in schools:
print school.school_name, school.total_students
You can use all your favorite SQL query pieces - WHERE, ORDER, LIMIT, even JOIN and all of that.
NOTE: I really recommend the peewee documentation, it’s fantastic.
Now that we’re good on this, let’s put peewee on the back burner for a hot second while we learn how to build tiny web apps with Flask