Installation

I don’t know what you’ve installed or how you’ve installed it, so let’s talk

before you run any of this.

OS X folks can run the following:

  • brew install geos
  • brew install gdal
  • brew install spatialindex
  • pip3 install pillow
  • pip3 install pysal
  • pip3 install geopandas
  • pip3 install https://github.com/matplotlib/basemap/archive/v1.0.7rel.tar.gz
  • pip3 install rtree

For Windows without Anaconda, use this guide to install through pip directly from whl files.

Geopandas Usage

Importing

You’ll be importing

  • pandas because you love it
  • geopandas for geographic stuff
  • Point from shapely to help convert CSV files into something geopandas can understand

and %matplotlib inline for viewing maps, of course.

import pandas as pd
import geopandas as gpd
from shapely.geometry import Point

%matplotlib inline

Opening a shapefile

Let’s open up the Community Districts data. What kind of file is it?

districts = gpd.read_file("Community Districts/districts.shp")

Introduction to the GeoDataFrame

A GeoDataFrame is just like a dataframe, it just… has geographic stuff in it.

districts.head()
boro_cd geometry shape_area shape_leng
0 311.0 POLYGON ((-73.97299433938896 40.60881414180224... 1.031759e+08 51566.989006
1 412.0 POLYGON ((-73.80168266553363 40.6663223525709,... 2.673336e+08 65933.851319
2 481.0 POLYGON ((-73.83591564875908 40.7434308933925,... 4.750313e+07 53795.118705
3 314.0 POLYGON ((-73.95630035122711 40.65504828183116... 8.217567e+07 49291.791502
4 313.0 POLYGON ((-73.98372152615246 40.59582107821707... 8.814953e+07 65746.939737
districts[districts.boro_cd > 400]
boro_cd geometry shape_area shape_leng
1 412.0 POLYGON ((-73.80168266553363 40.6663223525709,... 2.673336e+08 65933.851319
2 481.0 POLYGON ((-73.83591564875908 40.7434308933925,... 4.750313e+07 53795.118705
11 484.0 (POLYGON ((-73.93177580884338 40.5578995623848... 1.237923e+08 247830.043391
12 402.0 POLYGON ((-73.897923512633 40.75424000267809, ... 1.398945e+08 72182.746967
14 503.0 (POLYGON ((-74.13319580101383 40.5363067420505... 5.990016e+08 189072.053206
16 482.0 POLYGON ((-73.835900721417 40.71164137076879, ... 2.429264e+07 32007.621257
20 407.0 (POLYGON ((-73.83970053362432 40.7652895884367... 3.283557e+08 139168.786464
26 414.0 (POLYGON ((-73.91192551909782 40.5658138623772... 1.957296e+08 211877.538189
28 410.0 (POLYGON ((-73.85722330984366 40.6502786705413... 1.716549e+08 105187.759482
31 409.0 POLYGON ((-73.81709738756452 40.70402980158765... 1.073240e+08 50763.105913
32 413.0 POLYGON ((-73.74694320572226 40.63752920217553... 3.504749e+08 139785.720341
33 483.0 (POLYGON ((-73.74693970012704 40.6375542347127... 1.916683e+08 107277.264943
34 403.0 POLYGON ((-73.86275490196977 40.76676452492579... 8.348664e+07 41785.434175
35 480.0 POLYGON ((-73.86275490196977 40.76676452492579... 3.272369e+07 45440.808811
44 408.0 POLYGON ((-73.75670329167477 40.72622943266041... 2.075647e+08 69460.761676
45 411.0 POLYGON ((-73.71351828172088 40.75983773083244... 2.608159e+08 97698.814612
50 595.0 (POLYGON ((-74.11834169292992 40.5504596952958... 5.470213e+07 110287.441761
58 501.0 (POLYGON ((-74.15945602438185 40.6414483333240... 3.769558e+08 157936.427193
59 401.0 (POLYGON ((-73.90647314610101 40.7901811752080... 1.715380e+08 90044.686849
62 405.0 POLYGON ((-73.88770199204359 40.73429132275795... 2.103958e+08 70005.289342
63 502.0 (POLYGON ((-74.07346627066256 40.5783856858973... 5.931590e+08 142651.899080
68 404.0 POLYGON ((-73.84750820344139 40.73900780704004... 6.573957e+07 37018.409235
69 406.0 POLYGON ((-73.82606003226817 40.71539934860222... 8.269954e+07 42743.745703
districts.head()
boro_cd geometry shape_area shape_leng
0 311.0 POLYGON ((-73.97299433938896 40.60881414180224... 1.031759e+08 51566.989006
1 412.0 POLYGON ((-73.80168266553363 40.6663223525709,... 2.673336e+08 65933.851319
2 481.0 POLYGON ((-73.83591564875908 40.7434308933925,... 4.750313e+07 53795.118705
3 314.0 POLYGON ((-73.95630035122711 40.65504828183116... 8.217567e+07 49291.791502
4 313.0 POLYGON ((-73.98372152615246 40.59582107821707... 8.814953e+07 65746.939737

Visualizing a shapefile

You can just use .plot() to visualize a GeoDataFrame, it’s nice and easy.

districts.plot()
<matplotlib.axes._subplots.AxesSubplot at 0x1091f5748>

png

Changing the CRS (Coordinate Reference System)

WAY ONE: Just changing the Projection

# Go into the crs to convert it...
# ignore the datum and spheroid,
# just change the PROJECTION to MERCATOR
districts.to_crs({'proj': 'merc'}).plot()
<matplotlib.axes._subplots.AxesSubplot at 0x109bcbe48>

png

# Give it the SECRET CODE from the PETROLEUM GROUP
# (which you can try to find by googling)
# (or hopefully you have a list because they're
# all very confusingly/similarly named)
districts.to_crs(epsg=3857).plot()
<matplotlib.axes._subplots.AxesSubplot at 0x10a1ea550>

png

Opening a CSV of points

geopandas doesn’t understand a CSV file of lat/lon points, so you need to convert each line into shapely geometry, then feed that into a new geo dataframe.

Once you do that, you need to set the crs to {'init': 'epsg:4326'} so it knows what kind of datum/sphereoid/projection you’re measuring from.

Let’s try opening the earthquakes CSV

df = pd.read_csv("../earthquakes_1.0_day.csv")
df.head(2)
time latitude longitude depth mag magType nst gap dmin rms ... updated place type horizontalError depthError magError magNst status locationSource magSource
0 2017-05-03T01:58:56.924Z 59.804800 -136.833500 0.0 3.10 ml NaN NaN NaN 0.59 ... 2017-05-03T02:03:56.653Z 94km WNW of Skagway, Alaska earthquake NaN 0.3 NaN NaN automatic ak ak
1 2017-05-03T01:35:24.350Z 39.398167 -120.255501 5.6 2.42 md 21.0 73.0 0.1314 0.05 ... 2017-05-03T01:57:52.525Z 10km NW of Truckee, California earthquake 0.21 1.0 0.26 19.0 automatic nc nc

2 rows × 22 columns

def make_point(row):
    return Point(row.longitude, row.latitude)

# Go through every row, and make a point out of its lat and lon
points = df.apply(make_point, axis=1)

# Make a new GeoDataFrame
# using the data from our old df
# but also adding in the geometry we just made
earthquakes = gpd.GeoDataFrame(df, geometry=points)

# It doesn't come with a CRS because it's a CSV, so let's
# say "hey, let's use the standard shape of the earth etc"
earthquakes.crs = {'init': 'epsg:4326'}

# Let's look at the first few
earthquakes.head()
time latitude longitude depth mag magType nst gap dmin rms ... place type horizontalError depthError magError magNst status locationSource magSource geometry
0 2017-05-03T01:58:56.924Z 59.804800 -136.833500 0.00 3.10 ml NaN NaN NaN 0.59 ... 94km WNW of Skagway, Alaska earthquake NaN 0.30 NaN NaN automatic ak ak POINT (-136.8335 59.8048)
1 2017-05-03T01:35:24.350Z 39.398167 -120.255501 5.60 2.42 md 21.0 73.0 0.131400 0.05 ... 10km NW of Truckee, California earthquake 0.21 1.00 0.26 19.0 automatic nc nc POINT (-120.2555008 39.3981667)
2 2017-05-03T01:26:38.608Z 59.897000 -136.710900 0.00 2.00 ml NaN NaN NaN 0.82 ... 92km WNW of Skagway, Alaska earthquake NaN 0.40 NaN NaN automatic ak ak POINT (-136.7109 59.897)
3 2017-05-03T01:24:02.260Z 37.246834 -121.635498 2.98 2.57 md 58.0 43.0 0.032380 0.07 ... 13km N of Morgan Hill, California earthquake 0.14 0.34 0.09 63.0 automatic nc nc POINT (-121.635498 37.2468338)
4 2017-05-03T01:18:05.540Z 38.830166 -122.808166 1.75 1.11 md 17.0 74.0 0.006923 0.02 ... 7km W of Cobb, California earthquake 0.27 0.40 0.08 4.0 automatic nc nc POINT (-122.8081665 38.8301659)

5 rows × 23 columns

earthquakes.plot()
<matplotlib.axes._subplots.AxesSubplot at 0x102ea2128>

png

# Read in the CSV
df = pd.read_csv("../earthquakes_1.0_day.csv")

points = df.apply(lambda row: Point(row.longitude, row.latitude), axis=1)
earthquakes = gpd.GeoDataFrame(df, geometry=points)
earthquakes.crs = {'init': 'epsg:4326'}

# If you want to know how this all works, look above
earthquakes.head(2)
time latitude longitude depth mag magType nst gap dmin rms ... place type horizontalError depthError magError magNst status locationSource magSource geometry
0 2017-05-03T01:58:56.924Z 59.804800 -136.833500 0.0 3.10 ml NaN NaN NaN 0.59 ... 94km WNW of Skagway, Alaska earthquake NaN 0.3 NaN NaN automatic ak ak POINT (-136.8335 59.8048)
1 2017-05-03T01:35:24.350Z 39.398167 -120.255501 5.6 2.42 md 21.0 73.0 0.1314 0.05 ... 10km NW of Truckee, California earthquake 0.21 1.0 0.26 19.0 automatic nc nc POINT (-120.2555008 39.3981667)

2 rows × 23 columns

earthquakes.plot(figsize=(20,5))
<matplotlib.axes._subplots.AxesSubplot at 0x10ca75358>

png

Using the built-in map

world = gpd.read_file(gpd.datasets.get_path('naturalearth_lowres'))
ax = world.plot(figsize=(20,5))
ax.axis('off')
(-200.0, 200.0, -100.0, 100.0)

png

world.crs
{'init': 'epsg:4326'}

Styling your visuals

Setting size, line and shape colors, widths, axes

  • linewidth
  • color
  • edgecolor
  • ax.axis

Let’s plot the community districts!

ax = world.plot(figsize=(20,5), linewidth=0.25, edgecolor='white', color='lightgrey')
ax.axis('off')
(-200.0, 200.0, -100.0, 100.0)

png

Let’s plot the world!

ax = districts.to_crs(epsg=3857).plot(figsize=(20,5), linewidth=0.25, edgecolor='white', color='pink')
ax.axis('off')
(-8270000.0, -8200000.0, 4930000.0, 5000000.0)

png

Setting the projection

You can use to_crs to convert to different projections. In typical pandas fashion, you can do it a lot of ways, but the easiest is to send a epsg= and feed it the correct EPSG code.

You’ll also probably want to do an ax.axis('off') to turn off the splines and axes!

What are the EPSG codes for some common projections?

earthquakes.plot(figsize=(20,7))
<matplotlib.axes._subplots.AxesSubplot at 0x10c7dd160>

png

Styling markers

  • markersize
  • color
  • alpha
# Plot the earthquakes:
# BIGGER CIRCLES
# all one color
# make them a little transparent
# NO AXES OR BORDERS AROUND THE MAP!!!!!!!

ax = earthquakes.plot(figsize=(20,5), markersize=10, color='pink', alpha=0.5)
ax.axis('off')
(-200.0, 150.0, -40.0, 80.0)

png

# Plot the earthquakes:
# BIGGER CIRCLES
# all one color
# make them a little transparent
# NO AXES OR BORDERS AROUND THE MAP!!!!!!!

# Save the first layer as ax
ax = world.plot(color='lightgrey', linewidth=0.5, edgecolor='white', figsize=(15,5))
# Pass ax=ax to the second layer
earthquakes.plot(markersize=10, color='pink', alpha=0.5, ax=ax)
ax.axis('off')
(-200.0, 200.0, -100.0, 100.0)

png

Colormaps/ramps

Auto colormap

Giving your plot a column and a cmap will colorize your values. You can try plasma as your color map, or check out more here.

earthquakes.head(2)
time latitude longitude depth mag magType nst gap dmin rms ... place type horizontalError depthError magError magNst status locationSource magSource geometry
0 2017-05-03T01:58:56.924Z 59.804800 -136.833500 0.0 3.10 ml NaN NaN NaN 0.59 ... 94km WNW of Skagway, Alaska earthquake NaN 0.3 NaN NaN automatic ak ak POINT (-136.8335 59.8048)
1 2017-05-03T01:35:24.350Z 39.398167 -120.255501 5.6 2.42 md 21.0 73.0 0.1314 0.05 ... 10km NW of Truckee, California earthquake 0.21 1.0 0.26 19.0 automatic nc nc POINT (-120.2555008 39.3981667)

2 rows × 23 columns

# Plot the earthquakes:
# BIGGER CIRCLES
# all one color
# make them a little transparent
# NO AXES OR BORDERS AROUND THE MAP!!!!!!!

# Save the first layer as ax
ax = world.plot(color='lightgrey', linewidth=0.5, edgecolor='white', figsize=(15,5))
# Pass ax=ax to the second layer
earthquakes.plot(markersize=10, alpha=0.5, ax=ax, column='mag')
ax.axis('off')
(-200.0, 200.0, -100.0, 100.0)

png

# Plot the earthquakes:
# BIGGER CIRCLES
# all one color
# make them a little transparent
# NO AXES OR BORDERS AROUND THE MAP!!!!!!!

# Save the first layer as ax
ax = world.plot(color='lightgrey', linewidth=0.5, edgecolor='white', figsize=(15,5))
# Pass ax=ax to the second layer
earthquakes.plot(markersize=10, alpha=0.5, ax=ax, column='mag', cmap='plasma')
ax.axis('off')
(-200.0, 200.0, -100.0, 100.0)

png

world.head(2)
continent gdp_md_est geometry iso_a3 name pop_est
0 Asia 22270.0 POLYGON ((61.21081709172574 35.65007233330923,... AFG Afghanistan 28400000.0
1 Africa 110300.0 (POLYGON ((16.32652835456705 -5.87747039146621... AGO Angola 12799293.0

Auto colormap again

We can also try with the world. What’s the gdp_md_est column looking like?

world.plot(column='gdp_md_est', cmap='inferno')
<matplotlib.axes._subplots.AxesSubplot at 0x1120c6048>

png

world.plot(column='pop_est', cmap='inferno')
<matplotlib.axes._subplots.AxesSubplot at 0x10e312d30>

png

Plotting multiple layers of data

Let’s try plotting the earthquakes on top of the world. Save your first plot as ax and send it to the next one as ax=ax.

Setting the projection by proj with named projections

Instead of using an EPSG code, you can also set the projection with to_crs by .to_crs({'proj': 'merc'}) or something similar.

I don’t recommend this method, but it is a little friendlier than EPSG codes.

Plot the world with the default projection

ax = world.plot()
ax.set_title("Default")
<matplotlib.text.Text at 0x112b0f390>

png

Plot the world with Mercator (merc)

ax = world.to_crs({'proj': 'merc'}).plot()
ax.set_title("Default")
<matplotlib.text.Text at 0x11351c710>

png

Plot the world with [Transverse

Mercator](https://en.wikipedia.org/wiki/Transverse_Mercator_projection) (tmerc)

ax = world.to_crs({'proj': 'tmerc'}).plot()
ax.set_title("Default")
<matplotlib.text.Text at 0x113bc9c18>

png

Plot the world with Albers Equal Area (aea)

ax = world.to_crs({'proj': 'aea'}).plot()
ax.set_title("Default")
<matplotlib.text.Text at 0x10f7a4c88>

png

Spatial join

Dataset 1: States

Let’s import the states and clean them up a little bit. we need to clean the data up a little

# Read in the shapefile from cb_2016_us_state_500k as "states"
states = gpd.read_file("cb_2016_us_state_500k/cb_2016_us_state_500k.shp")
# Get rid of Guam, Mariana Islands and Virgin Islands
states = states[states.STATEFP.astype(int) < 60]
# Get rid of Hawaii and Alaska
states = states[~states.NAME.isin(['Hawaii', 'Alaska'])]
states.tail(5)
AFFGEOID ALAND AWATER GEOID LSAD NAME STATEFP STATENS STUSPS geometry
48 0400000US47 106797662267 2355188876 47 00 Tennessee 47 01325873 TN POLYGON ((-90.31029799999999 35.004295, -90.30...
49 0400000US48 676633459408 19025991684 48 00 Texas 48 01779801 TX (POLYGON ((-94.718296 29.728855, -94.717207 29...
50 0400000US51 102257320053 8528605825 51 00 Virginia 51 01779803 VA (POLYGON ((-75.742406 37.80835, -75.741512 37....
51 0400000US53 172113416541 12558451099 53 00 Washington 53 01779804 WA (POLYGON ((-122.33164 48.020556, -122.328343 4...
52 0400000US55 140273604537 29361386480 55 00 Wisconsin 55 01779806 WI (POLYGON ((-86.95616699999999 45.355489, -86.9...

Dataset 2: Waffle House

Read in wafflehouses.csv, and convert it to a GeoDataFrame.

# Read in the CSV
df = pd.read_csv("../wafflehouses.csv")

points = df.apply(lambda row: Point(row.long, row.lat), axis=1)
wafflehouses = gpd.GeoDataFrame(df, geometry=points)
wafflehouses.crs = {'init': 'epsg:4326'}

# If you want to know how this all works, look above
wafflehouses.head(2)
location lat long score geometry
0 Waffle House-Duluth,GA 33.991269 -84.153232 8 POINT (-84.15323199999999 33.991269)
1 Waffle House-Biloxi,MS 30.393627 -88.942691 3 POINT (-88.942691 30.393627)

Plot the locations, coloring based on the ‘score’ column.

ax = states.plot()
wafflehouses.plot(column='score', ax=ax)
<matplotlib.axes._subplots.AxesSubplot at 0x115cc6a20>

png

The actual spatial join

Is the CRS of the states the same as the CRS of the Waffle House locations?

states.crs
{'init': 'epsg:4269'}
wafflehouses.crs
{'init': 'epsg:4326'}

If not, we’ll force them to match using .to_crs

# Convert CRS to match
converted_states = states.to_crs(wafflehouses.crs)
converted_states.head(2)
AFFGEOID ALAND AWATER GEOID LSAD NAME STATEFP STATENS STUSPS geometry
0 0400000US01 131173688951 4593686489 01 00 Alabama 01 01779775 AL (POLYGON ((-88.053375 30.506987, -88.051087999...
2 0400000US04 294198560125 1027346486 04 00 Arizona 04 01779777 AZ POLYGON ((-114.816294 32.508038, -114.814321 3...
converted_states.crs
{'init': 'epsg:4326'}
wafflehouses.crs
{'init': 'epsg:4326'}

And now we can join

Open up Terminal and run

  • pip install rtree
  • brew install spatialindex
wafflehouses.head()
location lat long score geometry
0 Waffle House-Duluth,GA 33.991269 -84.153232 8 POINT (-84.15323199999999 33.991269)
1 Waffle House-Biloxi,MS 30.393627 -88.942691 3 POINT (-88.942691 30.393627)
2 Waffle House-Pearl,MS 32.269078 -90.135180 3 POINT (-90.13518000000001 32.269078)
3 Waffle House-Memphis,TN 35.155903 -89.885466 2 POINT (-89.88546600000001 35.155903)
4 Waffle House-Lumberton,TX 30.242025 -94.195967 2 POINT (-94.19596700000001 30.242025)
converted_states.head(2)
AFFGEOID ALAND AWATER GEOID LSAD NAME STATEFP STATENS STUSPS geometry
0 0400000US01 131173688951 4593686489 01 00 Alabama 01 01779775 AL (POLYGON ((-88.053375 30.506987, -88.051087999...
2 0400000US04 294198560125 1027346486 04 00 Arizona 04 01779777 AZ POLYGON ((-114.816294 32.508038, -114.814321 3...
converted_states.plot()
<matplotlib.axes._subplots.AxesSubplot at 0x11fa70128>

png

wafflehouses.plot()
<matplotlib.axes._subplots.AxesSubplot at 0x121714e80>

png

# http://geopandas.org/mergingdata.html
wafflehouses_with_state_data = gpd.sjoin(wafflehouses, converted_states, how='left', op='within')
wafflehouses_with_state_data.head()
location lat long score geometry index_right AFFGEOID ALAND AWATER GEOID LSAD NAME STATEFP STATENS STUSPS
0 Waffle House-Duluth,GA 33.991269 -84.153232 8 POINT (-84.15323199999999 33.991269) 9 0400000US13 1.491698e+11 4.741101e+09 13 00 Georgia 13 01705317 GA
1 Waffle House-Biloxi,MS 30.393627 -88.942691 3 POINT (-88.942691 30.393627) 18 0400000US28 1.215299e+11 3.930506e+09 28 00 Mississippi 28 01779790 MS
2 Waffle House-Pearl,MS 32.269078 -90.135180 3 POINT (-90.13518000000001 32.269078) 18 0400000US28 1.215299e+11 3.930506e+09 28 00 Mississippi 28 01779790 MS
3 Waffle House-Memphis,TN 35.155903 -89.885466 2 POINT (-89.88546600000001 35.155903) 48 0400000US47 1.067977e+11 2.355189e+09 47 00 Tennessee 47 01325873 TN
4 Waffle House-Lumberton,TX 30.242025 -94.195967 2 POINT (-94.19596700000001 30.242025) 49 0400000US48 6.766335e+11 1.902599e+10 48 00 Texas 48 01779801 TX
wafflehouses_with_state_data['NAME'].value_counts()
Georgia           416
North Carolina    165
South Carolina    163
Florida           152
Alabama           147
Tennessee         114
Texas             104
Mississippi        87
Louisiana          79
Ohio               72
Kentucky           63
Virginia           50
Arkansas           45
Missouri           40
Indiana            24
Arizona            17
Oklahoma           16
Pennsylvania       11
Maryland           11
Colorado           10
Kansas              5
West Virginia       4
Delaware            3
Illinois            2
New Mexico          2
Name: NAME, dtype: int64

Doing things with spatially joined data

  • What column do we use for color?
  • Add a legend with legend=True
  • Something is going to go wrong, though!
# Need dropna because some wafflehouses are missing states
wafflehouses_with_state_data.dropna().plot(column='NAME', markersize=10, figsize=(20,5))
<matplotlib.axes._subplots.AxesSubplot at 0x11d9adfd0>

png

What if we reverse the spatial join and make it ‘contains’?

How is this different than what we did before?

# http://geopandas.org/mergingdata.html
states_with_wafflehouse_data = gpd.sjoin(converted_states, wafflehouses, how='left', op='contains')
states_with_wafflehouse_data.head()
AFFGEOID ALAND AWATER GEOID LSAD NAME STATEFP STATENS STUSPS geometry index_right location lat long score
0 0400000US01 131173688951 4593686489 01 00 Alabama 01 01779775 AL (POLYGON ((-88.053375 30.506987, -88.051087999... 149 Waffle House-Tuscaloosa,AL 33.127149 -87.549124 5.0
0 0400000US01 131173688951 4593686489 01 00 Alabama 01 01779775 AL (POLYGON ((-88.053375 30.506987, -88.051087999... 33 Waffle House-Calera,AL 33.147916 -86.749224 6.0
0 0400000US01 131173688951 4593686489 01 00 Alabama 01 01779775 AL (POLYGON ((-88.053375 30.506987, -88.051087999... 148 Waffle House-Tuscaloosa,AL 33.165991 -87.551787 7.0
0 0400000US01 131173688951 4593686489 01 00 Alabama 01 01779775 AL (POLYGON ((-88.053375 30.506987, -88.051087999... 147 Waffle House-Tuscaloosa,AL 33.176447 -87.470757 10.0
0 0400000US01 131173688951 4593686489 01 00 Alabama 01 01779775 AL (POLYGON ((-88.053375 30.506987, -88.051087999... 146 Waffle House-Tuscaloosa,AL 33.177825 -87.524648 8.0
states_with_wafflehouse_data.shape
(1826, 15)
select_columns = states_with_wafflehouse_data[['NAME', 'geometry', 'score']]
select_columns.head()
NAME geometry score
0 Alabama (POLYGON ((-88.053375 30.506987, -88.051087999... 5.0
0 Alabama (POLYGON ((-88.053375 30.506987, -88.051087999... 6.0
0 Alabama (POLYGON ((-88.053375 30.506987, -88.051087999... 7.0
0 Alabama (POLYGON ((-88.053375 30.506987, -88.051087999... 10.0
0 Alabama (POLYGON ((-88.053375 30.506987, -88.051087999... 8.0

Spatial joins, option 1

We need to use .contains to say “find me all of the waffle houses inside of this one specific state”. We’ll use .sum() to count the number inside, but you could also do something like ['score'].mean() etc.

This is slower than actually using .sjoin, but we’re going to work with it for the moment.

First, let’s try it with one state

states.head(2)
AFFGEOID ALAND AWATER GEOID LSAD NAME STATEFP STATENS STUSPS geometry
0 0400000US01 131173688951 4593686489 01 00 Alabama 01 01779775 AL (POLYGON ((-88.053375 30.506987, -88.051087999...
2 0400000US04 294198560125 1027346486 04 00 Arizona 04 01779777 AZ POLYGON ((-114.816294 32.508038, -114.814321 3...
states.loc[0].geometry

svg

# False = 0
# True = 1
# Count the number of states where the name contains "New"
states.NAME.str.contains("New").sum()
4
# Give me the first state!
state = states.loc[0]
# Look at the wafflehouses...
# are they inside of the state's geometry?
wafflehouses.within(state.geometry).sum()
147
# So for our first state, there were 147 inside of there

Now, let’s try it with every state

# You can use .contains
# counts the true ones
states['wafflehouse_count'] = states.apply(lambda state: wafflehouses.within(state.geometry).sum(), axis=1)
states.head()
AFFGEOID ALAND AWATER GEOID LSAD NAME STATEFP STATENS STUSPS geometry wafflehouse_count
0 0400000US01 131173688951 4593686489 01 00 Alabama 01 01779775 AL (POLYGON ((-88.053375 30.506987, -88.051087999... 147
2 0400000US04 294198560125 1027346486 04 00 Arizona 04 01779777 AZ POLYGON ((-114.816294 32.508038, -114.814321 3... 17
3 0400000US05 134771517596 2960191698 05 00 Arkansas 05 00068085 AR POLYGON ((-94.6178329666013 36.4994141203285, ... 45
4 0400000US06 403501101370 20466718403 06 00 California 06 01779778 CA (POLYGON ((-118.604415 33.478552, -118.598783 ... 0
5 0400000US08 268429343790 1175112870 08 00 Colorado 08 01779779 CO POLYGON ((-109.060253 38.599328, -109.059541 3... 10
states.plot(column='wafflehouse_count', scheme='quantiles')
<matplotlib.axes._subplots.AxesSubplot at 0x125e68860>

png

states.plot(column='wafflehouse_count', scheme='equal_interval')
<matplotlib.axes._subplots.AxesSubplot at 0x127927668>

png

states.plot(column='wafflehouse_count', scheme='fisher_jenks', legend=True)
<matplotlib.axes._subplots.AxesSubplot at 0x12acedd68>

png

Saving the results

You want to look at this stuff in Leaflet, right? For that we’ll need to save. Geopandas supports practically every file format you could ever want.

wafflehouses_with_state_data.to_file("wafflehouses.json", driver='GeoJSON')