Python Data Structures
Do Now
1) Given the following code, what is the output?
i = 5
print(5 * 5)
# 25
print(i)
# 5
print("cat" + "dog")
# catdog
print("cat", "dog")
# cat dog
2a) The code below doesn’t work. Why not?
print(5 + "5")
We can’t add a string to an integer!
3) What is wrong with the following if statement?
if n = 2:
print("two")
This sets n equal to 2, which is always true; it will always print “two”. We want n == 2, which is how python structures comparisons.
4) What is the command to list the files in a directory? What is the command to go “up” one folder?
cd ..
.
means current directory and ..
means the directory above.
5) Given the following code, what would the outputs be for n = 0, 10, -1, 3, and -5?
Note: The printed version of the do-now had an typo; the “elseif” on the printed version should read “elif”.
print(n)
if n > 0:
print("A")
if n >= 3:
print("B")
print("C")
elif n > 2:
print("D")
else:
print("E")
elif 1 > n > -2:
print("F")
else:
print("G")
if n < 5:
print("H")
Output for 0: 0 F H
Output for 10: 10 A B C
Output for -1: -1 F H
Output for 3: 3 A B C H
Output for -5: -5 G H
On the Command Line
export CLICOLOR=1
atom ~/.bash_profile
atom
: opens atom~
: home directory.
: hidden & important filebash
: official name of the terminal/command line program
Add the export CLICOLOR=1
line.
Counting with Computers!
Let’s look at our food survey data. cat food-survey.txt
What if we wanted to know all the unique responses to the food survey? Does this work: uniq food-survey.txt
? Nah, because uniq
only consolidates terms that are adjacent to one another.
So, let’s order the terms in our file so that like terms are next to each other: sort food-survey.txt
. NOW, we’ll run uniq
: sort food-survey.txt | uniq
.
- AN ALTERNATE WAY, but piping is more efficient:
sort food-survey.txt > sorted-food.txt
and thenuniq sorted-food.txt
. sort food-survey.txt | uniq | wc -l
: gives you back the number of unique terms in food-survey.txtsort food-survey.txt | uniq | grep "P" | wc -l
: if you want to know how many unique foods start with the letter P.sort food-survey.txt | uniq -c
: give you a count of the unique termssort food-survey.txt | uniq -c | sort -r
: orders the unique terms based on their count
osha.tar.gz: + tar : collects files into a bundle. + gz : gzip (GNUZIP); indicates that file is compressed.
tar xvf osha.tar.gz
: opens our zipped file
Navigate to osha (cd osha
). ls
to see what’s in there.
cat FatalitiesFY* > everything.csv
: copies from multiple files into everything.csv
+ Oh no, everything.csv has the column headers in the
+ tail -n 2
: shows us last two lines of the file
+ tail -n +2
: shows us everything except for the first line of the file
+ tail -n +2 FatalitiesFY* > new-everthing.csv
+ rm new-everything.csv
Writing python things
Don’t forget to setup your virtual environment! On the command line, navigate to the folder where you’d like to save your python script (do this using cd [filepath]
). Once in the correct folder, use mkvirtualenv [some_name]
to get your virtual environment started.
One of the primary reasons why we’re using a virtual environment in class is so we can run Python 3. The code you write in class won’t work/will eventually crash unless you’re in an environment that is using Python 3. Check by typing
python --version
in command line.
Start up a new file in Atom – be sure to save that file in the same folder where you are running your virtual environment. For python scripts, your file name should end in .py
. For example, in class, we saved our file as data-structures.py
.
In Atom, edit data-structures.py. Let’s save some data in variables so that we can start playing with them!
Variable Assignment
name = "Soma"
city = "Brooklyn"
hometown = "Virginia"
age = 33
print("Hi, I'm", name, "from", hometown)
friend_name = "Jen"
friend_city = "Brooklyn"
friend_age = 31
friend_hometown = "Ludlow"
print("This is", friend_name, "from", friend_hometown)
other_friend_name = "Larissa"
other_friend_city = "Brooklyn"
other_friend_age = 29
other_friend_hometown = "New Jersey"
print("Here's", other_friend_name, "from", other_friend_hometown)
Variable assignment can be more complicated than it seems at first glance. Particularly, once you start manipulating your variables with math. For example, what should print if you run the following code?
fingers = 5
fingers + 5
print(fingers + 5)
print(fingers)
If you run this code in command line (type python3
and then type out what you see above), you’ll notice that you don’t print 10 as you might expect, but rather, you’ll print 5 again. fingers + 5
didn’t actually change the variable fingers
.
In order to pass add onto the old value, we have to do something like this:
fingers = fingers + 5
In math, we couldn’t do this! If your calculations get you to the point where x = x + 5, you know something has gone terribly wrong. BUT, in the world of coding, we HAVE to do this to add onto the existing value of a variable. The reason it works is that your computer, or python interpreter, always reads the right-hand side of a variable assignment first. This is worth reading up on if you’re unconvinced!
For now, what if we could keep all the information about a specific thing in one place, instead of using a billion variables?
Dictionaries & Lists
Dictionaries allow us to save lots of information about one particular thing.
me = { 'name': "Soma", 'city': "Brooklyn", 'age': 33 }
# In order to get things out you use me['name']
print(me['name']) # output will be: Soma
print(me['city']) # output will be: Brooklyn
The data in a dictionary is structured in comma-separated key-value pairs.
{ 'key': 'value', 'key2': 'value2'}
When you see curly braces, { }
, you’ll know that you’re dealing with a dictionary.
The key-value pairs in a dictionary are not ordered, meaning that each time you print(me)
, you may see name
, city
, and age
appear in different orders.
friend = { 'name': 'Jen', 'city': 'Brooklyn', 'age': 31 }
print(friend['name'])
# 'name' is a key
# 'city' is a key, 'age' is a key
# keys are the words in your dictionary
# 'Blake' is a value
# every key can only have one value
other = { 'name': 'Blake', 'city': 'Brooklyn', 'name': 'Blakie' }
print(other['name'])
Lists are a bit different. They ARE ordered (and therefore will print the same way each time).
# this is a list
names = [ 'Blake', 'Blakie', 'Blakester', 'Watch Guy', 'Balake' ]
# prints them all pretty good
# programming languages always start counting from zero
print(names)
print(names[2])
# This is how we get the first one in the list
print(names[0])
In the code above, we are printing by specifying the index, or position, of an item in our list. In programming, we count starting with 0 (so the first index position is 0). Forgetting this can lead to the ever popular off-by-one error.
# we don't do these things in Python:
# names.len
# names.length
# names.length()
# instead we do this:
print(len(names))
numbers = [ 56, 23, 87, 43, 1, 67, 9 ]
Functions & Methods
Functions and methods are ways of interacting with your data.
print(numbers)
# Length of the list
print(len(numbers))
# Biggest of the list
print(max(numbers))
# Smallest from the list
print(min(numbers))
# These are called FUNCTIONS!!!!!!!
# Functions are like factories
# You send a list to the min factory and get back the smallest
print(sorted(numbers))
print(numbers)
sorted_numbers = sorted(numbers)
print("sorted:", sorted_numbers)
print("unsorted:", numbers)
# This is called a METHOD
print(numbers.sort)
print(numbers.sort())
print(numbers)
print(len)
name = "Soma"
print(name)
# You like quiet
print(name.lower())
# I like yellin'
print(name.upper())
# Let's see a crazy thing
print("Ten Things I Hate About You".swapcase())
# What's the method for figuring out how many
# o's are in "Google" (a Python string)
word = "Google"
number_of_os = word.count('o')
print("There are", number_of_os, "o's in ", word)
cats = [ "Smushface", "Callery", "Naples" ]
print("Hello", cats)
print("Hello", cats[0])
print("Hello", cats[1])
print("Hello", cats[2])
For Loops
For Loops are used to loop through a list of things. It basically says, “for each item in my list, do something to each item.”
# This is called a for loop
# it... loops through stuff
# "Flow control" "language construct"
print("THIS IS THE CAT IN CATS ONE:")
for cat in cats:
print("Hello", cat)
The code above basically says, for each cat in my list of cats, print that cat! But, just because we called out list ‘cats’ doesn’t mean that we have to refer to each item/cat in that list as a cat in our for loop.
This still works:
print("THIS IS THE AIRPLANE ONE:")
for airplane in cats:
print("Hello", airplane)
Our structure is: For [arbitrary_name] in [list_name]:
. Be careful about using variable names you’ve already used before though! That will not work so well.
# This should still be "Soma"
print(name)
print("THIS IS THE NAME IN CATS ONE:")
for name in cats:
print("Hello", name)
if name == "Smushface":
print("Here's some food, my fattest cat")
else:
print("Sorry it looks like we're out of food, I guess?")
#print(name)
You can also have a list of dictionaries.
cat_info = [
{ 'name': 'Smushface', 'age': 6 },
{ 'name': 'Callery', 'age': 2 },
{ 'name': 'Naples', 'age': 'unknown' }
]
You can pull out information from your list of dictionaries by using what you know about list indices and dictionary keys! Use the following structure: list_name[list_index][dictionary_key]
. Examples below:
# prints the first dictionary in your list
print(cat_info[0])
# prints the value associated with the 'name' key in the second dictionary in your list
print(cat_info[1]['name'])
callery_the_cat = cat_info[1]
print(callery_the_cat['name'])
cat = { 'name': 'Stranger cat'}
cat['name']
If you have a list of dictionaries, you can also use a for loop to loop through the data contained inside. Get really accustomed to this! This is literally like the most important thing ever.
cat_info = [
{ 'name': 'Smushface', 'age': 6 },
{ 'name': 'Callery', 'age': 2 },
{ 'name': 'Naples', 'age': 'unknown' }
]
for cat in cat_info:
if str(cat['age']).lower() == 'unknown'.lower():
print("NO ONE KNOWS HOW OLD", cat['name'].upper(), "IS")
else:
print(cat['name'].upper(), "is", cat['age'], "years old")
Common Errors
- Don’t forget to save before re-running your script!
- Are you in the right folder? Atom tells you where your file is located if you look at the top of your window.
- Indentation is super import – be sure that the indentation of your for loops and if-statements are what you’d like them to be.