Combining datasets: Aggregate Columns of Points by Shapes

Not all points are created equal! Sometimes each dot is the same thing again and again, but sometimes not - maybe one dot counts multiple things (car accidents have numbers of victims), or difference dots have different values (each house has a different prices). And sometimes you’d like to combine all that data into ZIP codes or census tracts or states or countries or what-have-you.

So let’s do that.

We’re going to count the total enrollment in after-school programs by NYC school district.

NOTE: So, this one works, but most people apparently use Vector > Data Management Tools > Join Attributes By Location! Points in Polygon is more useful when you have a lot of overlapping polygons.

Step One: Open Up Your Data

First, semi-obviously, you’ll need to open up your data in QGIS. Make sure they have the same CRS (what’s that?) - if not, change the CRS to match.

I’ve used a CSV of after-school programs for NYC and a shapefile of school districts.

Opened up two layers

Make sure that your features overlap! If you don’t see any points in the same area as your shape, something probably went wrong with your CRS.

Step Two: Examine your data

Right-click your layer, then select Open Attribute Table. This will allow you to browse all of the columns in your data.

View attribute table

Make sure you have a column that you’d like to aggregate - I have my eye on ENROLLMENT. It’s 0 for a lot of rows, so my result is probably going to be inaccurate, but you’ll get the idea.

View attribute table

Step Three: Use Point In Polygon

Now use the top menu, and select Vector > Analysis Tools > Points In Polygon

Used points in polygon

Step Four: Fill in your options

Now let’s complete the Point in Polygon fields

  1. Your input vector layer is the shape - I’m using school districts.
  2. Your input point layer is your points - I’m using after-school programs.
  3. You’d like to sum of ENROLLMENT, so select ENROLLMENT and sum from the dropdown.
  4. Along with aggregating, it’s also going to count the number of points per geographic area. PNTCNT is the name of the count, not the name of the aggregate. They’ll automatically name the aggregate ENROLLMENT to match the field.
  5. You’ll want to add result to canvas - this will add it to the Layers once it’s complete.
  6. Click OK!
  7. Wait for it to finish, then click Close

Used points in polygon

When you’re combining a CSV and a shapefile, QGIS likes to complain about the CRS not matching even when it does match. If you get this error, you can probably ignore it.

Non-matching CRS

Step Five: Examine your new data

Right-click your brand-new layer, then select Open Attribute Table.

This will show you that your new shapefile contains not only the columns from before, but also two new columns - PNTCNT (the number of points added up) and ENROLLMENT (the sum of the enrollments of those points).

Two new columns

Step Six: Next steps

Maybe you’d like to learn how to color your map based on a column, or attach a column from a shape to the points inside it?

Want to hear when I release new things?
My infrequent and sporadic newsletter can help with that.