Welcome to Lede 2019: Foundations of Computing

Details

  • Instructor: Jonathan Soma, js4571@columbia.edu
  • Dates: Tuesdays and Thursdays, 5/28-7/2 + Saturdays 6/15, 6/22, 6/29
  • Class: 10am-1pm, World Room
  • Lab: 2pm-5pm, World Room
  • Slack channel: #foundations

Course Overview

By the end of this course you’ll have the flexibility to find and execute solutions to most any coding- or data-related problem you run across. In theory we’re focusing on Python in general, the data package pandas, and comfort with the command line.

Homework

On many, many assignments, I will give you:

  • More homework than you can reasonably accomplish
  • Homework that involves googling answers
  • Homework that requires thinking through problems and answers in complicated and specific ways

What’s this mean? It’s going to be hard.

If you find yourself falling down a black hole: just take a break. Or stop altogether! Don’t worry about it - ask people near you or TA’s for guidance.

Oh, and most importantly: if you can’t finish? Not a problem. It’s far more important to not get burned out and discouraged.

Schedule

This is a rough outline, and will absolutely change very, very often.

Introduction to Python and the command line (5/28 + 5/30)

In our first week we’ll take a look at the insides of our computers using the command line, with tools like cd, grep, and cat. Learn to navigate your computer and run basic Python scripts.

Exploring data and APIs with Jupyter Notebooks (6/4 + 6/6)

Become more comfortable with data types in Python by consuming data from APIs - dynamic sources of information that are easily understandable by Python. Learn the joys of Jupyter Notebooks, and the basics of git for version control.

Analyzing structured data with Pandas (6/11 + 6/13 + 6/15)

Begin work with pandas, a data analysis library that runs circles around Excel in analyzing, cleaning, and presenting data.

Also, how to take semi-structured text data and clean/extract the parts we’re interested in, along with taming troublesome datasets with format conversation, and filling in and ignoring “bad” values.

Obtaining data through scraping (6/18 + 6/20)

Using our Python skills to scrape web sites, including advanced scraping involving form submission and page interaction.

Servers and “too much” problems (6/22)

What do you do when your data is too big or your scraping takes too long? Servers lend a hand.

Geographic analysis with QGIS (6/25 + 6/27)

Become familiar with different geographic data types, geocoding, the limitless power of column and spatial joins.

Diving deeper into pandas visualization (6/29 + 7/2)

Getting deeper into matplotlib and seaborn with fancy plots and customization. Quick introduction to declarative visualization grammars with Altair and Vega.

BONUS CLASS: Self-directed project management and workflow (7/9)

How to manage your time, workflow, and expectations when working on a project.