Processing time with pandas
Let’s open up some data from the Census bureau - we’re going to use New Home Sales. The data is formatted… oddly, so I’ve cleaned it up for you as home-sales.csv inside of the data folder.
Open it without moving it. Tab autocomplete will help you.
val | per_name | |
---|---|---|
0 | 42.0 | 1963-01-01 |
1 | 35.0 | 1963-02-01 |
2 | 44.0 | 1963-03-01 |
3 | 52.0 | 1963-04-01 |
4 | 58.0 | 1963-05-01 |
100
is_adj | val | cat_code | cat_desc | cat_indent | dt_code | dt_desc | dt_unit | geo_code | geo_desc | per_name | |
---|---|---|---|---|---|---|---|---|---|---|---|
636 | 0 | 39.0 | SOLD | New Single-family Houses Sold | 0 | TOTAL | All Houses | K | US | United States | 2016-01-01 |
637 | 0 | 45.0 | SOLD | New Single-family Houses Sold | 0 | TOTAL | All Houses | K | US | United States | 2016-02-01 |
638 | 0 | 49.0 | SOLD | New Single-family Houses Sold | 0 | TOTAL | All Houses | K | US | United States | 2016-03-01 |
639 | 0 | 57.0 | SOLD | New Single-family Houses Sold | 0 | TOTAL | All Houses | K | US | United States | 2016-04-01 |
640 | 0 | 51.0 | SOLD | New Single-family Houses Sold | 0 | TOTAL | All Houses | K | US | United States | 2016-05-01 |
<matplotlib.axes._subplots.AxesSubplot at 0x105de97b8>
<matplotlib.axes._subplots.AxesSubplot at 0x105d4a2b0>
Creating a datetime column
0 1963-01-01
1 1963-02-01
2 1963-03-01
3 1963-04-01
4 1963-05-01
Name: per_name, dtype: object
CPU times: user 963 µs, sys: 0 ns, total: 963 µs
Wall time: 978 µs
0 1963-01-01
1 1963-02-01
2 1963-03-01
3 1963-04-01
4 1963-05-01
Name: per_name, dtype: datetime64[ns]
CPU times: user 1.46 ms, sys: 2 µs, total: 1.46 ms
Wall time: 1.48 ms
0 1963-01-01
1 1963-02-01
2 1963-03-01
3 1963-04-01
4 1963-05-01
Name: per_name, dtype: datetime64[ns]
is_adj | val | cat_code | cat_desc | cat_indent | dt_code | dt_desc | dt_unit | geo_code | geo_desc | per_name | date | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 0 | 42.0 | SOLD | New Single-family Houses Sold | 0 | TOTAL | All Houses | K | US | United States | 1963-01-01 | 1963-01-01 |
1 | 0 | 35.0 | SOLD | New Single-family Houses Sold | 0 | TOTAL | All Houses | K | US | United States | 1963-02-01 | 1963-02-01 |
2 | 0 | 44.0 | SOLD | New Single-family Houses Sold | 0 | TOTAL | All Houses | K | US | United States | 1963-03-01 | 1963-03-01 |
3 | 0 | 52.0 | SOLD | New Single-family Houses Sold | 0 | TOTAL | All Houses | K | US | United States | 1963-04-01 | 1963-04-01 |
4 | 0 | 58.0 | SOLD | New Single-family Houses Sold | 0 | TOTAL | All Houses | K | US | United States | 1963-05-01 | 1963-05-01 |
is_adj int64
val float64
cat_code object
cat_desc object
cat_indent int64
dt_code object
dt_desc object
dt_unit object
geo_code object
geo_desc object
per_name object
date datetime64[ns]
dtype: object
Changing the index to the datetime
Normally the index of the column is just a number.
is_adj | val | cat_code | cat_desc | cat_indent | dt_code | dt_desc | dt_unit | geo_code | geo_desc | per_name | |
---|---|---|---|---|---|---|---|---|---|---|---|
date | |||||||||||
1963-01-01 | 0 | 42.0 | SOLD | New Single-family Houses Sold | 0 | TOTAL | All Houses | K | US | United States | 1963-01-01 |
1963-02-01 | 0 | 35.0 | SOLD | New Single-family Houses Sold | 0 | TOTAL | All Houses | K | US | United States | 1963-02-01 |
1963-03-01 | 0 | 44.0 | SOLD | New Single-family Houses Sold | 0 | TOTAL | All Houses | K | US | United States | 1963-03-01 |
1963-04-01 | 0 | 52.0 | SOLD | New Single-family Houses Sold | 0 | TOTAL | All Houses | K | US | United States | 1963-04-01 |
1963-05-01 | 0 | 58.0 | SOLD | New Single-family Houses Sold | 0 | TOTAL | All Houses | K | US | United States | 1963-05-01 |
is_adj | val | cat_code | cat_desc | cat_indent | dt_code | dt_desc | dt_unit | geo_code | geo_desc | per_name | date | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 0 | 42.0 | SOLD | New Single-family Houses Sold | 0 | TOTAL | All Houses | K | US | United States | 1963-01-01 | 1963-01-01 |
1 | 0 | 35.0 | SOLD | New Single-family Houses Sold | 0 | TOTAL | All Houses | K | US | United States | 1963-02-01 | 1963-02-01 |
2 | 0 | 44.0 | SOLD | New Single-family Houses Sold | 0 | TOTAL | All Houses | K | US | United States | 1963-03-01 | 1963-03-01 |
3 | 0 | 52.0 | SOLD | New Single-family Houses Sold | 0 | TOTAL | All Houses | K | US | United States | 1963-04-01 | 1963-04-01 |
4 | 0 | 58.0 | SOLD | New Single-family Houses Sold | 0 | TOTAL | All Houses | K | US | United States | 1963-05-01 | 1963-05-01 |
It’s the column on the far left - 0
, 1
, 2
, 3
, 4
… boring and useless!
If we use .set_index to replace the index with the datetime, though, we can
start to have some fun
Be sure you use inplace=True
or else it won’t save the new index!
Selecting specific(-ish) dates via the index
Now that our index is a datetime, we can select date ranges super super easily.
Selecting by year
is_adj | val | cat_code | cat_desc | cat_indent | dt_code | dt_desc | dt_unit | geo_code | geo_desc | per_name | |
---|---|---|---|---|---|---|---|---|---|---|---|
date | |||||||||||
1975-01-01 | 0 | 29.0 | SOLD | New Single-family Houses Sold | 0 | TOTAL | All Houses | K | US | United States | 1975-01-01 |
1975-02-01 | 0 | 34.0 | SOLD | New Single-family Houses Sold | 0 | TOTAL | All Houses | K | US | United States | 1975-02-01 |
1975-03-01 | 0 | 44.0 | SOLD | New Single-family Houses Sold | 0 | TOTAL | All Houses | K | US | United States | 1975-03-01 |
1975-04-01 | 0 | 54.0 | SOLD | New Single-family Houses Sold | 0 | TOTAL | All Houses | K | US | United States | 1975-04-01 |
1975-05-01 | 0 | 57.0 | SOLD | New Single-family Houses Sold | 0 | TOTAL | All Houses | K | US | United States | 1975-05-01 |
1975-06-01 | 0 | 51.0 | SOLD | New Single-family Houses Sold | 0 | TOTAL | All Houses | K | US | United States | 1975-06-01 |
1975-07-01 | 0 | 51.0 | SOLD | New Single-family Houses Sold | 0 | TOTAL | All Houses | K | US | United States | 1975-07-01 |
1975-08-01 | 0 | 53.0 | SOLD | New Single-family Houses Sold | 0 | TOTAL | All Houses | K | US | United States | 1975-08-01 |
1975-09-01 | 0 | 46.0 | SOLD | New Single-family Houses Sold | 0 | TOTAL | All Houses | K | US | United States | 1975-09-01 |
1975-10-01 | 0 | 46.0 | SOLD | New Single-family Houses Sold | 0 | TOTAL | All Houses | K | US | United States | 1975-10-01 |
1975-11-01 | 0 | 46.0 | SOLD | New Single-family Houses Sold | 0 | TOTAL | All Houses | K | US | United States | 1975-11-01 |
1975-12-01 | 0 | 39.0 | SOLD | New Single-family Houses Sold | 0 | TOTAL | All Houses | K | US | United States | 1975-12-01 |
List slices with datetimes
We can also use list slicing with datetimes! Usually we would say things like `df10
Just for review, you can use :
to only select certain parts of a list:
('banana', 'orange', 'apple', 'blueberries', 'strawberries')
('banana', 'orange')
('orange', 'apple', 'blueberries')
('blueberries', 'strawberries')
Instead of using boring ol’ numbers, we can use dates instead.
Getting rows after a certain date
is_adj | val | cat_code | cat_desc | cat_indent | dt_code | dt_desc | dt_unit | geo_code | geo_desc | per_name | |
---|---|---|---|---|---|---|---|---|---|---|---|
date | |||||||||||
1970-01-01 | 0 | 34.0 | SOLD | New Single-family Houses Sold | 0 | TOTAL | All Houses | K | US | United States | 1970-01-01 |
1970-02-01 | 0 | 29.0 | SOLD | New Single-family Houses Sold | 0 | TOTAL | All Houses | K | US | United States | 1970-02-01 |
1970-03-01 | 0 | 36.0 | SOLD | New Single-family Houses Sold | 0 | TOTAL | All Houses | K | US | United States | 1970-03-01 |
1970-04-01 | 0 | 42.0 | SOLD | New Single-family Houses Sold | 0 | TOTAL | All Houses | K | US | United States | 1970-04-01 |
1970-05-01 | 0 | 43.0 | SOLD | New Single-family Houses Sold | 0 | TOTAL | All Houses | K | US | United States | 1970-05-01 |
1970-06-01 | 0 | 44.0 | SOLD | New Single-family Houses Sold | 0 | TOTAL | All Houses | K | US | United States | 1970-06-01 |
1970-07-01 | 0 | 44.0 | SOLD | New Single-family Houses Sold | 0 | TOTAL | All Houses | K | US | United States | 1970-07-01 |
1970-08-01 | 0 | 48.0 | SOLD | New Single-family Houses Sold | 0 | TOTAL | All Houses | K | US | United States | 1970-08-01 |
1970-09-01 | 0 | 45.0 | SOLD | New Single-family Houses Sold | 0 | TOTAL | All Houses | K | US | United States | 1970-09-01 |
1970-10-01 | 0 | 44.0 | SOLD | New Single-family Houses Sold | 0 | TOTAL | All Houses | K | US | United States | 1970-10-01 |
1970-11-01 | 0 | 40.0 | SOLD | New Single-family Houses Sold | 0 | TOTAL | All Houses | K | US | United States | 1970-11-01 |
1970-12-01 | 0 | 37.0 | SOLD | New Single-family Houses Sold | 0 | TOTAL | All Houses | K | US | United States | 1970-12-01 |
1971-01-01 | 0 | 45.0 | SOLD | New Single-family Houses Sold | 0 | TOTAL | All Houses | K | US | United States | 1971-01-01 |
1971-02-01 | 0 | 49.0 | SOLD | New Single-family Houses Sold | 0 | TOTAL | All Houses | K | US | United States | 1971-02-01 |
1971-03-01 | 0 | 62.0 | SOLD | New Single-family Houses Sold | 0 | TOTAL | All Houses | K | US | United States | 1971-03-01 |
1971-04-01 | 0 | 62.0 | SOLD | New Single-family Houses Sold | 0 | TOTAL | All Houses | K | US | United States | 1971-04-01 |
1971-05-01 | 0 | 58.0 | SOLD | New Single-family Houses Sold | 0 | TOTAL | All Houses | K | US | United States | 1971-05-01 |
1971-06-01 | 0 | 59.0 | SOLD | New Single-family Houses Sold | 0 | TOTAL | All Houses | K | US | United States | 1971-06-01 |
1971-07-01 | 0 | 64.0 | SOLD | New Single-family Houses Sold | 0 | TOTAL | All Houses | K | US | United States | 1971-07-01 |
1971-08-01 | 0 | 62.0 | SOLD | New Single-family Houses Sold | 0 | TOTAL | All Houses | K | US | United States | 1971-08-01 |
1971-09-01 | 0 | 50.0 | SOLD | New Single-family Houses Sold | 0 | TOTAL | All Houses | K | US | United States | 1971-09-01 |
1971-10-01 | 0 | 52.0 | SOLD | New Single-family Houses Sold | 0 | TOTAL | All Houses | K | US | United States | 1971-10-01 |
1971-11-01 | 0 | 50.0 | SOLD | New Single-family Houses Sold | 0 | TOTAL | All Houses | K | US | United States | 1971-11-01 |
1971-12-01 | 0 | 44.0 | SOLD | New Single-family Houses Sold | 0 | TOTAL | All Houses | K | US | United States | 1971-12-01 |
1972-01-01 | 0 | 51.0 | SOLD | New Single-family Houses Sold | 0 | TOTAL | All Houses | K | US | United States | 1972-01-01 |
1972-02-01 | 0 | 56.0 | SOLD | New Single-family Houses Sold | 0 | TOTAL | All Houses | K | US | United States | 1972-02-01 |
1972-03-01 | 0 | 60.0 | SOLD | New Single-family Houses Sold | 0 | TOTAL | All Houses | K | US | United States | 1972-03-01 |
1972-04-01 | 0 | 65.0 | SOLD | New Single-family Houses Sold | 0 | TOTAL | All Houses | K | US | United States | 1972-04-01 |
1972-05-01 | 0 | 64.0 | SOLD | New Single-family Houses Sold | 0 | TOTAL | All Houses | K | US | United States | 1972-05-01 |
1972-06-01 | 0 | 63.0 | SOLD | New Single-family Houses Sold | 0 | TOTAL | All Houses | K | US | United States | 1972-06-01 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
2013-12-01 | 0 | 31.0 | SOLD | New Single-family Houses Sold | 0 | TOTAL | All Houses | K | US | United States | 2013-12-01 |
2014-01-01 | 0 | 33.0 | SOLD | New Single-family Houses Sold | 0 | TOTAL | All Houses | K | US | United States | 2014-01-01 |
2014-02-01 | 0 | 35.0 | SOLD | New Single-family Houses Sold | 0 | TOTAL | All Houses | K | US | United States | 2014-02-01 |
2014-03-01 | 0 | 39.0 | SOLD | New Single-family Houses Sold | 0 | TOTAL | All Houses | K | US | United States | 2014-03-01 |
2014-04-01 | 0 | 39.0 | SOLD | New Single-family Houses Sold | 0 | TOTAL | All Houses | K | US | United States | 2014-04-01 |
2014-05-01 | 0 | 43.0 | SOLD | New Single-family Houses Sold | 0 | TOTAL | All Houses | K | US | United States | 2014-05-01 |
2014-06-01 | 0 | 38.0 | SOLD | New Single-family Houses Sold | 0 | TOTAL | All Houses | K | US | United States | 2014-06-01 |
2014-07-01 | 0 | 35.0 | SOLD | New Single-family Houses Sold | 0 | TOTAL | All Houses | K | US | United States | 2014-07-01 |
2014-08-01 | 0 | 36.0 | SOLD | New Single-family Houses Sold | 0 | TOTAL | All Houses | K | US | United States | 2014-08-01 |
2014-09-01 | 0 | 37.0 | SOLD | New Single-family Houses Sold | 0 | TOTAL | All Houses | K | US | United States | 2014-09-01 |
2014-10-01 | 0 | 38.0 | SOLD | New Single-family Houses Sold | 0 | TOTAL | All Houses | K | US | United States | 2014-10-01 |
2014-11-01 | 0 | 31.0 | SOLD | New Single-family Houses Sold | 0 | TOTAL | All Houses | K | US | United States | 2014-11-01 |
2014-12-01 | 0 | 35.0 | SOLD | New Single-family Houses Sold | 0 | TOTAL | All Houses | K | US | United States | 2014-12-01 |
2015-01-01 | 0 | 39.0 | SOLD | New Single-family Houses Sold | 0 | TOTAL | All Houses | K | US | United States | 2015-01-01 |
2015-02-01 | 0 | 45.0 | SOLD | New Single-family Houses Sold | 0 | TOTAL | All Houses | K | US | United States | 2015-02-01 |
2015-03-01 | 0 | 46.0 | SOLD | New Single-family Houses Sold | 0 | TOTAL | All Houses | K | US | United States | 2015-03-01 |
2015-04-01 | 0 | 48.0 | SOLD | New Single-family Houses Sold | 0 | TOTAL | All Houses | K | US | United States | 2015-04-01 |
2015-05-01 | 0 | 47.0 | SOLD | New Single-family Houses Sold | 0 | TOTAL | All Houses | K | US | United States | 2015-05-01 |
2015-06-01 | 0 | 44.0 | SOLD | New Single-family Houses Sold | 0 | TOTAL | All Houses | K | US | United States | 2015-06-01 |
2015-07-01 | 0 | 43.0 | SOLD | New Single-family Houses Sold | 0 | TOTAL | All Houses | K | US | United States | 2015-07-01 |
2015-08-01 | 0 | 41.0 | SOLD | New Single-family Houses Sold | 0 | TOTAL | All Houses | K | US | United States | 2015-08-01 |
2015-09-01 | 0 | 35.0 | SOLD | New Single-family Houses Sold | 0 | TOTAL | All Houses | K | US | United States | 2015-09-01 |
2015-10-01 | 0 | 39.0 | SOLD | New Single-family Houses Sold | 0 | TOTAL | All Houses | K | US | United States | 2015-10-01 |
2015-11-01 | 0 | 36.0 | SOLD | New Single-family Houses Sold | 0 | TOTAL | All Houses | K | US | United States | 2015-11-01 |
2015-12-01 | 0 | 38.0 | SOLD | New Single-family Houses Sold | 0 | TOTAL | All Houses | K | US | United States | 2015-12-01 |
2016-01-01 | 0 | 39.0 | SOLD | New Single-family Houses Sold | 0 | TOTAL | All Houses | K | US | United States | 2016-01-01 |
2016-02-01 | 0 | 45.0 | SOLD | New Single-family Houses Sold | 0 | TOTAL | All Houses | K | US | United States | 2016-02-01 |
2016-03-01 | 0 | 49.0 | SOLD | New Single-family Houses Sold | 0 | TOTAL | All Houses | K | US | United States | 2016-03-01 |
2016-04-01 | 0 | 57.0 | SOLD | New Single-family Houses Sold | 0 | TOTAL | All Houses | K | US | United States | 2016-04-01 |
2016-05-01 | 0 | 51.0 | SOLD | New Single-family Houses Sold | 0 | TOTAL | All Houses | K | US | United States | 2016-05-01 |
557 rows × 11 columns
Getting rows between a certain date
is_adj | val | cat_code | cat_desc | cat_indent | dt_code | dt_desc | dt_unit | geo_code | geo_desc | per_name | |
---|---|---|---|---|---|---|---|---|---|---|---|
date | |||||||||||
1970-02-01 | 0 | 29.0 | SOLD | New Single-family Houses Sold | 0 | TOTAL | All Houses | K | US | United States | 1970-02-01 |
1970-03-01 | 0 | 36.0 | SOLD | New Single-family Houses Sold | 0 | TOTAL | All Houses | K | US | United States | 1970-03-01 |
1970-04-01 | 0 | 42.0 | SOLD | New Single-family Houses Sold | 0 | TOTAL | All Houses | K | US | United States | 1970-04-01 |
1970-05-01 | 0 | 43.0 | SOLD | New Single-family Houses Sold | 0 | TOTAL | All Houses | K | US | United States | 1970-05-01 |
1970-06-01 | 0 | 44.0 | SOLD | New Single-family Houses Sold | 0 | TOTAL | All Houses | K | US | United States | 1970-06-01 |
Count the number of houses sold in the 70’s and count the number sold in the
80’s
6557.0
6088.0
How do we select every single February?
date
1963-01-01 False
1963-02-01 True
1963-03-01 False
1963-04-01 False
1963-05-01 False
Name: per_name, dtype: bool
is_adj | val | cat_code | cat_desc | cat_indent | dt_code | dt_desc | dt_unit | geo_code | geo_desc | per_name | |
---|---|---|---|---|---|---|---|---|---|---|---|
date | |||||||||||
1963-06-01 | 0 | 48.0 | SOLD | New Single-family Houses Sold | 0 | TOTAL | All Houses | K | US | United States | 1963-06-01 |
1963-07-01 | 0 | 62.0 | SOLD | New Single-family Houses Sold | 0 | TOTAL | All Houses | K | US | United States | 1963-07-01 |
1963-08-01 | 0 | 56.0 | SOLD | New Single-family Houses Sold | 0 | TOTAL | All Houses | K | US | United States | 1963-08-01 |
1964-06-01 | 0 | 53.0 | SOLD | New Single-family Houses Sold | 0 | TOTAL | All Houses | K | US | United States | 1964-06-01 |
1964-07-01 | 0 | 54.0 | SOLD | New Single-family Houses Sold | 0 | TOTAL | All Houses | K | US | United States | 1964-07-01 |
1964-08-01 | 0 | 56.0 | SOLD | New Single-family Houses Sold | 0 | TOTAL | All Houses | K | US | United States | 1964-08-01 |
1965-06-01 | 0 | 57.0 | SOLD | New Single-family Houses Sold | 0 | TOTAL | All Houses | K | US | United States | 1965-06-01 |
1965-07-01 | 0 | 51.0 | SOLD | New Single-family Houses Sold | 0 | TOTAL | All Houses | K | US | United States | 1965-07-01 |
1965-08-01 | 0 | 58.0 | SOLD | New Single-family Houses Sold | 0 | TOTAL | All Houses | K | US | United States | 1965-08-01 |
1966-06-01 | 0 | 40.0 | SOLD | New Single-family Houses Sold | 0 | TOTAL | All Houses | K | US | United States | 1966-06-01 |
1966-07-01 | 0 | 40.0 | SOLD | New Single-family Houses Sold | 0 | TOTAL | All Houses | K | US | United States | 1966-07-01 |
1966-08-01 | 0 | 36.0 | SOLD | New Single-family Houses Sold | 0 | TOTAL | All Houses | K | US | United States | 1966-08-01 |
1967-06-01 | 0 | 47.0 | SOLD | New Single-family Houses Sold | 0 | TOTAL | All Houses | K | US | United States | 1967-06-01 |
1967-07-01 | 0 | 46.0 | SOLD | New Single-family Houses Sold | 0 | TOTAL | All Houses | K | US | United States | 1967-07-01 |
1967-08-01 | 0 | 47.0 | SOLD | New Single-family Houses Sold | 0 | TOTAL | All Houses | K | US | United States | 1967-08-01 |
1968-06-01 | 0 | 41.0 | SOLD | New Single-family Houses Sold | 0 | TOTAL | All Houses | K | US | United States | 1968-06-01 |
1968-07-01 | 0 | 44.0 | SOLD | New Single-family Houses Sold | 0 | TOTAL | All Houses | K | US | United States | 1968-07-01 |
1968-08-01 | 0 | 47.0 | SOLD | New Single-family Houses Sold | 0 | TOTAL | All Houses | K | US | United States | 1968-08-01 |
1969-06-01 | 0 | 44.0 | SOLD | New Single-family Houses Sold | 0 | TOTAL | All Houses | K | US | United States | 1969-06-01 |
1969-07-01 | 0 | 39.0 | SOLD | New Single-family Houses Sold | 0 | TOTAL | All Houses | K | US | United States | 1969-07-01 |
1969-08-01 | 0 | 40.0 | SOLD | New Single-family Houses Sold | 0 | TOTAL | All Houses | K | US | United States | 1969-08-01 |
1970-06-01 | 0 | 44.0 | SOLD | New Single-family Houses Sold | 0 | TOTAL | All Houses | K | US | United States | 1970-06-01 |
1970-07-01 | 0 | 44.0 | SOLD | New Single-family Houses Sold | 0 | TOTAL | All Houses | K | US | United States | 1970-07-01 |
1970-08-01 | 0 | 48.0 | SOLD | New Single-family Houses Sold | 0 | TOTAL | All Houses | K | US | United States | 1970-08-01 |
1971-06-01 | 0 | 59.0 | SOLD | New Single-family Houses Sold | 0 | TOTAL | All Houses | K | US | United States | 1971-06-01 |
1971-07-01 | 0 | 64.0 | SOLD | New Single-family Houses Sold | 0 | TOTAL | All Houses | K | US | United States | 1971-07-01 |
1971-08-01 | 0 | 62.0 | SOLD | New Single-family Houses Sold | 0 | TOTAL | All Houses | K | US | United States | 1971-08-01 |
1972-06-01 | 0 | 63.0 | SOLD | New Single-family Houses Sold | 0 | TOTAL | All Houses | K | US | United States | 1972-06-01 |
1972-07-01 | 0 | 63.0 | SOLD | New Single-family Houses Sold | 0 | TOTAL | All Houses | K | US | United States | 1972-07-01 |
1972-08-01 | 0 | 72.0 | SOLD | New Single-family Houses Sold | 0 | TOTAL | All Houses | K | US | United States | 1972-08-01 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
2006-06-01 | 0 | 98.0 | SOLD | New Single-family Houses Sold | 0 | TOTAL | All Houses | K | US | United States | 2006-06-01 |
2006-07-01 | 0 | 83.0 | SOLD | New Single-family Houses Sold | 0 | TOTAL | All Houses | K | US | United States | 2006-07-01 |
2006-08-01 | 0 | 88.0 | SOLD | New Single-family Houses Sold | 0 | TOTAL | All Houses | K | US | United States | 2006-08-01 |
2007-06-01 | 0 | 73.0 | SOLD | New Single-family Houses Sold | 0 | TOTAL | All Houses | K | US | United States | 2007-06-01 |
2007-07-01 | 0 | 68.0 | SOLD | New Single-family Houses Sold | 0 | TOTAL | All Houses | K | US | United States | 2007-07-01 |
2007-08-01 | 0 | 60.0 | SOLD | New Single-family Houses Sold | 0 | TOTAL | All Houses | K | US | United States | 2007-08-01 |
2008-06-01 | 0 | 45.0 | SOLD | New Single-family Houses Sold | 0 | TOTAL | All Houses | K | US | United States | 2008-06-01 |
2008-07-01 | 0 | 43.0 | SOLD | New Single-family Houses Sold | 0 | TOTAL | All Houses | K | US | United States | 2008-07-01 |
2008-08-01 | 0 | 38.0 | SOLD | New Single-family Houses Sold | 0 | TOTAL | All Houses | K | US | United States | 2008-08-01 |
2009-06-01 | 0 | 37.0 | SOLD | New Single-family Houses Sold | 0 | TOTAL | All Houses | K | US | United States | 2009-06-01 |
2009-07-01 | 0 | 38.0 | SOLD | New Single-family Houses Sold | 0 | TOTAL | All Houses | K | US | United States | 2009-07-01 |
2009-08-01 | 0 | 36.0 | SOLD | New Single-family Houses Sold | 0 | TOTAL | All Houses | K | US | United States | 2009-08-01 |
2010-06-01 | 0 | 28.0 | SOLD | New Single-family Houses Sold | 0 | TOTAL | All Houses | K | US | United States | 2010-06-01 |
2010-07-01 | 0 | 26.0 | SOLD | New Single-family Houses Sold | 0 | TOTAL | All Houses | K | US | United States | 2010-07-01 |
2010-08-01 | 0 | 23.0 | SOLD | New Single-family Houses Sold | 0 | TOTAL | All Houses | K | US | United States | 2010-08-01 |
2011-06-01 | 0 | 28.0 | SOLD | New Single-family Houses Sold | 0 | TOTAL | All Houses | K | US | United States | 2011-06-01 |
2011-07-01 | 0 | 27.0 | SOLD | New Single-family Houses Sold | 0 | TOTAL | All Houses | K | US | United States | 2011-07-01 |
2011-08-01 | 0 | 25.0 | SOLD | New Single-family Houses Sold | 0 | TOTAL | All Houses | K | US | United States | 2011-08-01 |
2012-06-01 | 0 | 34.0 | SOLD | New Single-family Houses Sold | 0 | TOTAL | All Houses | K | US | United States | 2012-06-01 |
2012-07-01 | 0 | 33.0 | SOLD | New Single-family Houses Sold | 0 | TOTAL | All Houses | K | US | United States | 2012-07-01 |
2012-08-01 | 0 | 31.0 | SOLD | New Single-family Houses Sold | 0 | TOTAL | All Houses | K | US | United States | 2012-08-01 |
2013-06-01 | 0 | 43.0 | SOLD | New Single-family Houses Sold | 0 | TOTAL | All Houses | K | US | United States | 2013-06-01 |
2013-07-01 | 0 | 33.0 | SOLD | New Single-family Houses Sold | 0 | TOTAL | All Houses | K | US | United States | 2013-07-01 |
2013-08-01 | 0 | 31.0 | SOLD | New Single-family Houses Sold | 0 | TOTAL | All Houses | K | US | United States | 2013-08-01 |
2014-06-01 | 0 | 38.0 | SOLD | New Single-family Houses Sold | 0 | TOTAL | All Houses | K | US | United States | 2014-06-01 |
2014-07-01 | 0 | 35.0 | SOLD | New Single-family Houses Sold | 0 | TOTAL | All Houses | K | US | United States | 2014-07-01 |
2014-08-01 | 0 | 36.0 | SOLD | New Single-family Houses Sold | 0 | TOTAL | All Houses | K | US | United States | 2014-08-01 |
2015-06-01 | 0 | 44.0 | SOLD | New Single-family Houses Sold | 0 | TOTAL | All Houses | K | US | United States | 2015-06-01 |
2015-07-01 | 0 | 43.0 | SOLD | New Single-family Houses Sold | 0 | TOTAL | All Houses | K | US | United States | 2015-07-01 |
2015-08-01 | 0 | 41.0 | SOLD | New Single-family Houses Sold | 0 | TOTAL | All Houses | K | US | United States | 2015-08-01 |
159 rows × 11 columns
Info on our time series
If you try to .plot
, pandas will automatically use the index (the date) as the
x axis for you. This makes like perfect. because you don’t have to think
about anything, and calculations automatically have a good axis.
<matplotlib.axes._subplots.AxesSubplot at 0x105d4ae80>
<matplotlib.axes._subplots.AxesSubplot at 0x10615b748>
Grouping with resample, not with groupby
Hmmm, looks like something bad might have happened to the housing industry t
some point. Maybe we want to see some numbers instead of a graph? To do
aggregate statistics on time series in pandas we use a method called
.resample()
, and we’re going to tell it to group the data by year.
When we tell it to group by year, we need to give it a special code. I always get mine from this StackOverflow post http://stackoverflow.com/a/17001474 because it’s much more convenient than the pandas documentation.
is_adj | val | cat_code | cat_desc | cat_indent | dt_code | dt_desc | dt_unit | geo_code | geo_desc | per_name | |
---|---|---|---|---|---|---|---|---|---|---|---|
date | |||||||||||
1963-01-01 | 0 | 42.0 | SOLD | New Single-family Houses Sold | 0 | TOTAL | All Houses | K | US | United States | 1963-01-01 |
1963-02-01 | 0 | 35.0 | SOLD | New Single-family Houses Sold | 0 | TOTAL | All Houses | K | US | United States | 1963-02-01 |
<matplotlib.axes._subplots.AxesSubplot at 0x106296ef0>
<matplotlib.axes._subplots.AxesSubplot at 0x1063e8320>
<matplotlib.axes._subplots.AxesSubplot at 0x106506390>
<matplotlib.axes._subplots.AxesSubplot at 0x106216860>
<matplotlib.axes._subplots.AxesSubplot at 0x10689aef0>
Notice that it’s December of every year. That still looks like too much data, though. What if we back out to every decade?
Cool, right?
Graphing
We can graph these instead of just look at them! Get ready!
Graphing all on one plot
We’ve done this before, but it’s more exciting now - save the first plot as ax
and pass it to the others as the confusingly- or conveniently-named ax=ax
.
<matplotlib.axes._subplots.AxesSubplot at 0x107422f98>
Cyclical data (actually using groupby)
What were the top 5 worst months?
is_adj | val | cat_code | cat_desc | cat_indent | dt_code | dt_desc | dt_unit | geo_code | geo_desc | per_name | |
---|---|---|---|---|---|---|---|---|---|---|---|
date | |||||||||||
2010-11-01 | 0 | 20.0 | SOLD | New Single-family Houses Sold | 0 | TOTAL | All Houses | K | US | United States | 2010-11-01 |
2011-01-01 | 0 | 21.0 | SOLD | New Single-family Houses Sold | 0 | TOTAL | All Houses | K | US | United States | 2011-01-01 |
2011-02-01 | 0 | 22.0 | SOLD | New Single-family Houses Sold | 0 | TOTAL | All Houses | K | US | United States | 2011-02-01 |
2010-08-01 | 0 | 23.0 | SOLD | New Single-family Houses Sold | 0 | TOTAL | All Houses | K | US | United States | 2010-08-01 |
2010-10-01 | 0 | 23.0 | SOLD | New Single-family Houses Sold | 0 | TOTAL | All Houses | K | US | United States | 2010-10-01 |
It seems like there might be a cycle ever year. Maybe houses are sold in the summer and not the winter? To do this we can’t use resample - it’s for putting time into buckets - we need to group by the month.
Getting the month
We can’t ask for the index column as “year” any more, but we can just use
df.index
instead.
Int64Index([ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10,
...
8, 9, 10, 11, 12, 1, 2, 3, 4, 5],
dtype='int64', name='date', length=641)
To get the month of each date, it’s simply df.index.month
.
Doing the groupby to view data by month
So when we do our groupby, we’ll say hey, we made the groups for you already. Then we ask for the median number of houses sold.
Int64Index([1, 4, 4, 0, 2, 5, 0, 3, 6, 1,
...
5, 1, 3, 6, 1, 4, 0, 1, 4, 6],
dtype='int64', name='date', length=641)
is_adj | val | cat_indent | |
---|---|---|---|
date | |||
1 | 0 | 47.685185 | 0 |
2 | 0 | 53.592593 | 0 |
3 | 0 | 62.685185 | 0 |
4 | 0 | 61.444444 | 0 |
5 | 0 | 61.129630 | 0 |
6 | 0 | 59.188679 | 0 |
7 | 0 | 56.735849 | 0 |
8 | 0 | 57.566038 | 0 |
9 | 0 | 52.037736 | 0 |
10 | 0 | 51.584906 | 0 |
11 | 0 | 45.471698 | 0 |
12 | 0 | 42.792453 | 0 |
Plot the results
<matplotlib.axes._subplots.AxesSubplot at 0x10710cef0>
<matplotlib.axes._subplots.AxesSubplot at 0x10713e630>
More details
You can also use max and min and all of your other aggregate friends
with .resample
. For example, what’s the largest number of houses sold in a
given year?
date
2005-12-31 1283.0
2004-12-31 1203.0
2003-12-31 1088.0
Name: val, dtype: float64
How about the fewest?
date
2016-12-31 241.0
2011-12-31 305.0
2010-12-31 322.0
Name: val, dtype: float64
Since we’re feeling crazy, why don’t we plot the average, max and min for each year?
<matplotlib.axes._subplots.AxesSubplot at 0x1082cb7f0>
<matplotlib.axes._subplots.AxesSubplot at 0x1082c34e0>
Using .fill_between
to draw between lines
Getting maximums and minimums for years is a very common way of dealing with
resampled time series. A fun graph to make is the fill_between
graph, which
colors between two areas.
Step 1: Calculate the maxes and mins and get a list of x values
ax.fill_between
requires three things
- a list of x values (the dates),
- a list of minimum values for those x values,
- and a list of maximum values for those x values
/usr/local/lib/python3.6/site-packages/ipykernel_launcher.py:2: FutureWarning:
.resample() is now a deferred operation
You called index(...) on this deferred object which materialized it into a series
by implicitly taking the mean. Use .resample(...).mean() instead
DatetimeIndex(['1963-12-31', '1964-12-31', '1965-12-31', '1966-12-31',
'1967-12-31', '1968-12-31', '1969-12-31', '1970-12-31',
'1971-12-31', '1972-12-31', '1973-12-31', '1974-12-31',
'1975-12-31', '1976-12-31', '1977-12-31', '1978-12-31',
'1979-12-31', '1980-12-31', '1981-12-31', '1982-12-31',
'1983-12-31', '1984-12-31', '1985-12-31', '1986-12-31',
'1987-12-31', '1988-12-31', '1989-12-31', '1990-12-31',
'1991-12-31', '1992-12-31', '1993-12-31', '1994-12-31',
'1995-12-31', '1996-12-31', '1997-12-31', '1998-12-31',
'1999-12-31', '2000-12-31', '2001-12-31', '2002-12-31',
'2003-12-31', '2004-12-31', '2005-12-31', '2006-12-31',
'2007-12-31', '2008-12-31', '2009-12-31', '2010-12-31',
'2011-12-31', '2012-12-31', '2013-12-31', '2014-12-31',
'2015-12-31', '2016-12-31'],
dtype='datetime64[ns]', name='date', freq='A-DEC')
date
1963-12-31 62.0
1964-12-31 56.0
1965-12-31 58.0
1966-12-31 53.0
1967-12-31 49.0
Freq: A-DEC, Name: val, dtype: float64
Step 2: Build the graph
<matplotlib.axes._subplots.AxesSubplot at 0x108527668>
Step 3: Add the fill
ax.fill_between
requires three things
- a list of x values
- a list of minimum values
- and a list of maximum values
We have all of those, so let’s draw it!
/usr/local/lib/python3.6/site-packages/ipykernel_launcher.py:3: FutureWarning:
.resample() is now a deferred operation
You called index(...) on this deferred object which materialized it into a series
by implicitly taking the mean. Use .resample(...).mean() instead
This is separate from the ipykernel package so we can avoid doing imports until
<matplotlib.collections.PolyCollection at 0x109244eb8>