An Introduction to pandas

Pandas! They are adorable animals. You might think they are the worst animal ev er but that is not true. You might sometimes think pandas is the worst library every, and that is only kind of true.

The important thing is use the right tool for the job. pandas is good for some stuff, SQL is good for some stuff, writing raw Python is good for some stuff. You’ll figure it out as you go along.

Now let’s start coding. Hopefully you did pip install pandas before you started up this notebook.

# import pandas, but call it pd. Why? Because that's What People Do.
import pandas as pd

When you import pandas, you use import pandas as pd. That means instead of typing pandas in your code you’ll type pd.

You don’t have to, but every other person on the planet will be doing it, so you might as well.

Now we’re going to read in a file. Our file is called NBA- Census-10.14.2013.csv because we’re sports moguls. pandas can read_ different types of files, so try to figure it out by typing pd.read_ and hitting tab for autocomplete.

# We're going to call this df, which means "../data frame"
# It isn't in UTF-8 (I saved it from my mac!) so we need to set the encoding
df = pd.read_csv("NBA-Census-10.14.2013.csv", encoding='mac_roman')

A dataframe is basically a spreadsheet, except it lives in the world of Python or the statistical programming language R. They can’t call it a spreadsheet because then people would think those programmers used Excel, which would make them boring and normal and they’d have to wear a tie every day.

Selecting rows

Now let’s look at our data, since that’s what data is for

# Let's look at all of it
df
Name Age Team POS # 2013 $ Ht (In.) WT EXP 1st Year DOB School City State (Province, Territory, Etc..) Country Race HS Only
0 Gee, Alonzo 26 Cavaliers F 33 $3,250,000 78 219 4 2009 5/29/1987 Alabama Riviera Beach, FL Florida US Black No
1 Wallace, Gerald 31 Celtics F 45 $10,105,855 79 220 12 2001 7/23/1982 Alabama Sylacauga, AL Alabama US Black No
2 Williams, Mo 30 Trail Blazers G 25 $2,652,000 73 195 10 2003 12/19/1982 Alabama Jackson, MS Mississippi US Black No
3 Gladness, Mickell 27 Magic C 40 $762,195 83 220 2 2011 7/26/1986 Alabama A&M Birmingham, AL Alabama US Black No
4 Jefferson, Richard 33 Jazz F 44 $11,046,000 79 230 12 2001 6/21/1980 Arizona Los Angeles, CA California US Black No
5 Hill, Solomon 22 Pacers F 9 $1,246,680 79 220 0 2013 3/18/1991 Arizona Los Angeles, CA California US Black No
6 Budinger, Chase 25 Timberwolves F 10 $5,000,000 79 218 4 2009 5/22/1988 Arizona Encinitas, CA California US White No
7 Williams, Derrick 22 Timberwolves F 7 $5,016,960 80 241 2 2011 5/25/1991 Arizona La Mirada, CA California US Black No
8 Hill, Jordan 26 Lakers F/C 27 $3,563,600 82 235 1 2012 7/27/1987 Arizona Newberry, SC South Carolina US Black No
9 Frye, Channing 30 Suns F/C 8 $6,500,000 83 245 8 2005 5/17/1983 Arizona White Plains, NY New York US Black No
10 Bayless, Jerryd 25 Grizzlies G 7 $3,135,000 75 200 5 2008 8/20/1988 Arizona Phoenix, AZ Arizona US Black No
11 Terry, Jason 36 Nets G 31 $5,625,313 74 180 14 1999 9/15/1977 Arizona Seattle, WA Washington US Black No
12 Fogg, Kyle 23 Nuggets G 6 n/a 75 183 0 2013 1/27/1990 Arizona Brea, CA California US Black No
13 Iguodala, Andre 29 Warriors G/F 9 $12,868,632 78 207 9 2004 1/28/1984 Arizona Springfield, IL Illinois US Black No
14 Boateng, Eric 27 Lakers C 12 n/a 82 257 17 1996 11/20/1985 Arizona State London, ENG n/a England Black No
15 Diogu, Ike 29 Knicks F/C 50 $792,377 80 255 8 2005 11/9/1983 Arizona State Buffalo, NY New York US Black No
16 Ayres, Jeff 26 Spurs F/C 11 $1,750,000 81 250 4 2009 4/29/1987 Arizona State Ontario, CA California US Black No
17 Harden, James 24 Rockets G 13 $13,701,250 77 220 4 2009 8/26/1989 Arizona State Los Angeles, CA California US Black No
18 Felix, Carrick 23 Cavaliers G/F 30 $510,000 78 210 0 2013 8/17/1990 Arizona State Goodyear, AZ Arizona US Black No
19 Pargo, Jannero 33 Bobcats G 5 $884,293 73 185 11 2002 10/22/1979 Arkansas Chicago, IL Illinois US Black No
20 Beverley, Patrick 25 Rockets G 2 $788,872 73 185 5 2008 7/12/1988 Arkansas Chicago, IL Illinois US Black No
21 Johnson, Joe 32 Nets G/F 7 $21,466,718 79 240 12 2001 6/29/1981 Arkansas Little Rock, AR Arkansas US Black No
22 Brewer, Ronnie 28 Rockets G/F 10 $1,186,459 79 235 7 2006 3/20/1985 Arkansas Portland, OR Oregon US Black No
23 Fisher, Derek 39 Thunder G 6 $884,293 73 210 17 1996 8/9/1974 Arkansas-Little Rock Little Rock, AR Arkansas US Black No
24 Miller, Quincy 20 Nuggets F 30 $788,872 81 210 1 2012 11/18/1992 Baylor North Carolina, IL Illinois US Black No
25 Acy, Quincy 23 Raptors F 4 $788,872 79 225 1 2012 10/6/1990 Baylor Tyler, TX Texas US Black No
26 Jones, Perry 22 Thunder F 3 $1,082,520 83 235 1 2012 9/24/1991 Baylor Winnsboro, LA Louisiana US Black No
27 Udoh, Ekpe 26 Bucks F/C 5 $4,469,548 82 245 3 2010 5/20/1987 Baylor Edmond, OK Oklahoma US Black No
28 Clark, Ian 22 Jazz G 21 $490,180 75 175 0 2013 3/7/1991 Belmont Memphis, TN Tennessee US Black No
29 Andersen, Chris 35 Heat F/C 11 $1,399,507 82 228 12 2001 7/7/1978 Blinn College Long Beach, CA California US White No
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
498 Paul, Chris 28 Clippers G 3 $18,668,431 72 175 8 2005 5/6/1985 Wake Forest Forsyth County, NC North Carolina US Black No
499 Teague, Jeff 25 Hawks G 0 $8,000,000 74 181 4 2009 6/10/1988 Wake Forest Indianapolis, IN Indiana US Black No
500 Smith, Ish 25 Suns G 30 $951,463 72 175 3 2010 7/5/1988 Wake Forest Charlotte, NC North Carolina US Black No
501 Duncan, Tim 37 Spurs F/C 21 $10,361,446 83 255 16 1997 4/25/1976 Wake Forest Christiansted, VI Virgin Islands Virgin Islands Black No
502 Hawes, Spencer 25 76ers C 0 $6,500,000 85 245 6 2007 4/28/1988 Washington Seattle, WA Washington US White No
503 Wroten, Tony 20 76ers G 8 $1,160,040 78 205 1 2012 4/13/1993 Washington Renton, WA Washington US Black No
504 Gaddy, Abdul 21 Bobcats G 10 n/a 75 185 0 2013 1/26/1992 Washington Tacoma, WA Washington US Black No
505 Thomas, Isaiah 24 Kings G 22 $884,293 69 185 2 2011 2/7/1989 Washington Tacoma, WA Washington US Black No
506 Robinson, Nate 29 Nuggets G 10 $2,016,000 69 180 8 2005 5/31/1984 Washington Seattle, WA Washington US Black No
507 Ross, Terrence 22 Raptors G 31 $2,678,640 78 195 1 2012 2/5/1991 Washington Portland, OR Oregon US Black No
508 Pondexter, Quincy 25 Grizzlies G/F 20 $225,479 78 225 3 2010 3/10/1988 Washington Fresno, CA California US Black No
509 Holiday, Justin 24 Jazz G/F 22 $788,872 78 185 0 2013 4/5/1989 Washington Mission Hills, CA California US Black No
510 Baynes, Aron 26 Spurs F/C 16 $788,872 82 260 0 2013 12/9/1986 Washington State Gisborne, NZ n/a New Zealand White No
511 Thompson, Klay 23 Warriors G/F 11 $2,317,920 79 205 2 2011 2/8/1990 Washington State Los Angeles, CA California US Mixed No
512 Lillard, Damian 23 Trail Blazers G 0 $3,202,920 75 195 1 2012 7/15/1990 Weber State Oakland, CA California US Black No
513 Alexander, Joe 26 Warriors F 25 $854,389 80 230 5 2008 12/26/1986 West Virginia Kaohsiung, TA n/a Taiwan White No
514 Fischer, D'or 32 Wizards C 21 n/a 83 255 0 2013 10/12/1981 West Virginia Philadelphia, PA Pennsylvania US Black No
515 Ebanks, Devin 23 Mavericks F 37 $884,293 81 215 3 2010 10/28/1989 West Virginia New York City, NY New York US Black No
516 Johnson, Amir 26 Raptors F/C 15 $6,500,000 81 210 8 2005 5/1/1987 Westchester HS (CA) Los Angeles, CA California US Black Yes
517 Martin, Kevin 30 Timberwolves G 23 $6,500,000 79 185 9 2004 2/1/1983 Western Carolina Zanesville, OH Ohio US Mixed No
518 Evans, Jeremy 25 Jazz F 40 $1,660,257 81 194 3 2010 10/24/1987 Western Kentucky Crossett, AR Arkansas US Black No
519 Lee, Courtney 28 Celtics G/F 11 $5,225,000 77 200 5 2008 10/3/1985 Western Kentucky Indianapolis, IN Indiana US Black No
520 Mekel, Gal 25 Mavericks G 33 $490,180 75 191 5 2008 3/4/1988 Wichita State Petah Tikva n/a Israel White No
521 Murry, Toure' 23 Knicks G/F 23 $490,180 77 195 0 2013 11/8/1989 Wichita State Houston, TX Texas US Black No
522 Stiemsma, Greg 28 Pelicans C 34 $2,676,000 83 260 2 2011 9/26/1985 Wisconsin Randolph, WI Wisconsin US White No
523 Leuer, Jon 24 Grizzlies F 30 $900,000 82 228 2 2011 5/14/1989 Wisconsin Long Lake, MN Minnesota US White No
524 Landry, Marcus 27 Lakers F 14 $788,872 79 225 17 1996 11/1/1985 Wisconsin Milwaukee, WI Wisconsin US Black No
525 Harris, Devin 30 Mavericks G 20 $854,389 75 192 9 2004 2/27/1983 Wisconsin Milwaukee, WI Wisconsin US Black No
526 West, David 33 Pacers F 21 $12,000,000 81 250 10 2003 8/29/1980 Xavier Teaneck, NJ New Jersey US Black No
527 Crawford, Jordan 24 Celtics G 27 $2,162,419 76 195 3 2010 10/23/1988 Xavier Detroit, MI Michigan US Black No

528 rows × 17 columns

If we scroll we can see all of it. But maybe we don’t want to see all of it. Maybe we hate scrolling?

# Look at the first few rows
df.head()
Name Age Team POS # 2013 $ Ht (In.) WT EXP 1st Year DOB School City State (Province, Territory, Etc..) Country Race HS Only
0 Gee, Alonzo 26 Cavaliers F 33 $3,250,000 78 219 4 2009 5/29/1987 Alabama Riviera Beach, FL Florida US Black No
1 Wallace, Gerald 31 Celtics F 45 $10,105,855 79 220 12 2001 7/23/1982 Alabama Sylacauga, AL Alabama US Black No
2 Williams, Mo 30 Trail Blazers G 25 $2,652,000 73 195 10 2003 12/19/1982 Alabama Jackson, MS Mississippi US Black No
3 Gladness, Mickell 27 Magic C 40 $762,195 83 220 2 2011 7/26/1986 Alabama A&M Birmingham, AL Alabama US Black No
4 Jefferson, Richard 33 Jazz F 44 $11,046,000 79 230 12 2001 6/21/1980 Arizona Los Angeles, CA California US Black No

…but maybe we want to see more than a measly five results?

# Let's look at MORE of the first few rows
df.head(10)
Name Age Team POS # 2013 $ Ht (In.) WT EXP 1st Year DOB School City State (Province, Territory, Etc..) Country Race HS Only
0 Gee, Alonzo 26 Cavaliers F 33 $3,250,000 78 219 4 2009 5/29/1987 Alabama Riviera Beach, FL Florida US Black No
1 Wallace, Gerald 31 Celtics F 45 $10,105,855 79 220 12 2001 7/23/1982 Alabama Sylacauga, AL Alabama US Black No
2 Williams, Mo 30 Trail Blazers G 25 $2,652,000 73 195 10 2003 12/19/1982 Alabama Jackson, MS Mississippi US Black No
3 Gladness, Mickell 27 Magic C 40 $762,195 83 220 2 2011 7/26/1986 Alabama A&M Birmingham, AL Alabama US Black No
4 Jefferson, Richard 33 Jazz F 44 $11,046,000 79 230 12 2001 6/21/1980 Arizona Los Angeles, CA California US Black No
5 Hill, Solomon 22 Pacers F 9 $1,246,680 79 220 0 2013 3/18/1991 Arizona Los Angeles, CA California US Black No
6 Budinger, Chase 25 Timberwolves F 10 $5,000,000 79 218 4 2009 5/22/1988 Arizona Encinitas, CA California US White No
7 Williams, Derrick 22 Timberwolves F 7 $5,016,960 80 241 2 2011 5/25/1991 Arizona La Mirada, CA California US Black No
8 Hill, Jordan 26 Lakers F/C 27 $3,563,600 82 235 1 2012 7/27/1987 Arizona Newberry, SC South Carolina US Black No
9 Frye, Channing 30 Suns F/C 8 $6,500,000 83 245 8 2005 5/17/1983 Arizona White Plains, NY New York US Black No

But maybe we want to make a basketball joke and see the final four?

# Let's look at the final few rows
df.tail(4)
Name Age Team POS # 2013 $ Ht (In.) WT EXP 1st Year DOB School City State (Province, Territory, Etc..) Country Race HS Only
524 Landry, Marcus 27 Lakers F 14 $788,872 79 225 17 1996 11/1/1985 Wisconsin Milwaukee, WI Wisconsin US Black No
525 Harris, Devin 30 Mavericks G 20 $854,389 75 192 9 2004 2/27/1983 Wisconsin Milwaukee, WI Wisconsin US Black No
526 West, David 33 Pacers F 21 $12,000,000 81 250 10 2003 8/29/1980 Xavier Teaneck, NJ New Jersey US Black No
527 Crawford, Jordan 24 Celtics G 27 $2,162,419 76 195 3 2010 10/23/1988 Xavier Detroit, MI Michigan US Black No

So yes, head and tail work kind of like the terminal commands. That’s nice, I guess.

But maybe we’re incredibly demanding (which we are) and we want, say, the 6th through the 8th row (which we do). Don’t worry (which I know you were), we can do that, too.

# Show the 6th through the 8th rows
df[5:8]
Name Age Team POS # 2013 $ Ht (In.) WT EXP 1st Year DOB School City State (Province, Territory, Etc..) Country Race HS Only
5 Hill, Solomon 22 Pacers F 9 $1,246,680 79 220 0 2013 3/18/1991 Arizona Los Angeles, CA California US Black No
6 Budinger, Chase 25 Timberwolves F 10 $5,000,000 79 218 4 2009 5/22/1988 Arizona Encinitas, CA California US White No
7 Williams, Derrick 22 Timberwolves F 7 $5,016,960 80 241 2 2011 5/25/1991 Arizona La Mirada, CA California US Black No

It’s kind of like an array, right? Except where in an array we’d say df[0] this time we need to give it two numbers, the start and the end.

Selecting columns

But jeez, my eyes don’t want to go that far over the data. I only want to see, uh, name and age.

# Get the names of the columns, just because
df.columns
Index(['Name', 'Age', 'Team', 'POS', '#', '2013 $', 'Ht (In.)', 'WT', 'EXP',
       '1st Year', 'DOB', 'School', 'City',
       'State (Province, Territory, Etc..)', 'Country', 'Race', 'HS Only'],
      dtype='object')
# If we want to be "correct" we add .values on the end of it
df.columns.values
array(['Name', 'Age', 'Team', 'POS', '#', '2013 $', 'Ht (In.)', 'WT',
       'EXP', '1st Year', 'DOB', 'School', 'City',
       'State (Province, Territory, Etc..)', 'Country', 'Race', 'HS Only'], dtype=object)
# Select only name and age
columns_to_show = ['Name', 'Age']
df[columns_to_show]
Name Age
0 Gee, Alonzo 26
1 Wallace, Gerald 31
2 Williams, Mo 30
3 Gladness, Mickell 27
4 Jefferson, Richard 33
5 Hill, Solomon 22
6 Budinger, Chase 25
7 Williams, Derrick 22
8 Hill, Jordan 26
9 Frye, Channing 30
10 Bayless, Jerryd 25
11 Terry, Jason 36
12 Fogg, Kyle 23
13 Iguodala, Andre 29
14 Boateng, Eric 27
15 Diogu, Ike 29
16 Ayres, Jeff 26
17 Harden, James 24
18 Felix, Carrick 23
19 Pargo, Jannero 33
20 Beverley, Patrick 25
21 Johnson, Joe 32
22 Brewer, Ronnie 28
23 Fisher, Derek 39
24 Miller, Quincy 20
25 Acy, Quincy 23
26 Jones, Perry 22
27 Udoh, Ekpe 26
28 Clark, Ian 22
29 Andersen, Chris 35
... ... ...
498 Paul, Chris 28
499 Teague, Jeff 25
500 Smith, Ish 25
501 Duncan, Tim 37
502 Hawes, Spencer 25
503 Wroten, Tony 20
504 Gaddy, Abdul 21
505 Thomas, Isaiah 24
506 Robinson, Nate 29
507 Ross, Terrence 22
508 Pondexter, Quincy 25
509 Holiday, Justin 24
510 Baynes, Aron 26
511 Thompson, Klay 23
512 Lillard, Damian 23
513 Alexander, Joe 26
514 Fischer, D'or 32
515 Ebanks, Devin 23
516 Johnson, Amir 26
517 Martin, Kevin 30
518 Evans, Jeremy 25
519 Lee, Courtney 28
520 Mekel, Gal 25
521 Murry, Toure' 23
522 Stiemsma, Greg 28
523 Leuer, Jon 24
524 Landry, Marcus 27
525 Harris, Devin 30
526 West, David 33
527 Crawford, Jordan 24

528 rows × 2 columns

# Combing that with .head() to see not-so-many rows
columns_to_show = ['Name', 'Age']
df[columns_to_show].head()
Name Age
0 Gee, Alonzo 26
1 Wallace, Gerald 31
2 Williams, Mo 30
3 Gladness, Mickell 27
4 Jefferson, Richard 33
# We can also do this all in one line, even though it starts looking ugly
# (unlike the cute bears pandas looks ugly pretty often)
df[['Name', 'Age']].head()
Name Age
0 Gee, Alonzo 26
1 Wallace, Gerald 31
2 Williams, Mo 30
3 Gladness, Mickell 27
4 Jefferson, Richard 33

NOTE: That was not df['Name', 'Age'], it was df[['Name', 'Age]]. You’ll definitely type it wrong all of the time. When things break with pandas it’s probably because you forgot to put in a million brackets.

Describing your data

A powerful tool of pandas is being able to select a portion of your data, because who ordered all that data anyway.

df.head()
Name Age Team POS # 2013 $ Ht (In.) WT EXP 1st Year DOB School City State (Province, Territory, Etc..) Country Race HS Only
0 Gee, Alonzo 26 Cavaliers F 33 $3,250,000 78 219 4 2009 5/29/1987 Alabama Riviera Beach, FL Florida US Black No
1 Wallace, Gerald 31 Celtics F 45 $10,105,855 79 220 12 2001 7/23/1982 Alabama Sylacauga, AL Alabama US Black No
2 Williams, Mo 30 Trail Blazers G 25 $2,652,000 73 195 10 2003 12/19/1982 Alabama Jackson, MS Mississippi US Black No
3 Gladness, Mickell 27 Magic C 40 $762,195 83 220 2 2011 7/26/1986 Alabama A&M Birmingham, AL Alabama US Black No
4 Jefferson, Richard 33 Jazz F 44 $11,046,000 79 230 12 2001 6/21/1980 Arizona Los Angeles, CA California US Black No

I want to know how many people are in each position. Luckily, pandas can tell me!

# Grab the POS column, and count the different values in it.
df['POS'].value_counts()
G      175
F      142
F/C     74
G/F     70
C       67
Name: POS, dtype: int64

Now that was a little weird, yes - we used df['POS'] instead of df[['POS']] when viewing the data’s details.

But now I’m curious about numbers: how old is everyone? Maybe we could, I don’t know, get some statistics about age? Some statistics to describe age?

# Summary statistics for Age
df['Age'].describe()
count    528.000000
mean      26.242424
std        4.178868
min       18.000000
25%       23.000000
50%       25.000000
75%       29.000000
max       39.000000
Name: Age, dtype: float64
# That's pretty good. Does it work for everything? How about the money?
df['2013 $'].describe()
count     528
unique    308
top       n/a
freq       43
Name: 2013 $, dtype: object

Unfortunately because that has dollar signs and commas it’s thought of as a string. We’ll fix it in a second, but let’s try describing one more thing.

# Doing more describing
df['Ht (In.)'].describe()
count    528.000000
mean      79.119318
std        3.431488
min       69.000000
25%       77.000000
50%       80.000000
75%       82.000000
max       87.000000
Name: Ht (In.), dtype: float64

That’s stupid, though, what’s an inch even look like? What’s 80 inches? I don’t have a clue. If only there were some wa to manipulate our data.

Manipulating data

Oh wait there is, HA HA HA.

# Take another look at our inches, but only the first few
df['Ht (In.)'].head()
0    78
1    79
2    73
3    83
4    79
Name: Ht (In.), dtype: int64
# Divide those inches by 12
df['Ht (In.)'].head() / 12
0    6.500000
1    6.583333
2    6.083333
3    6.916667
4    6.583333
Name: Ht (In.), dtype: float64
# Let's divide ALL of them by 12
feet = df['Ht (In.)'] / 12
feet
0      6.500000
1      6.583333
2      6.083333
3      6.916667
4      6.583333
5      6.583333
6      6.583333
7      6.666667
8      6.833333
9      6.916667
10     6.250000
11     6.166667
12     6.250000
13     6.500000
14     6.833333
15     6.666667
16     6.750000
17     6.416667
18     6.500000
19     6.083333
20     6.083333
21     6.583333
22     6.583333
23     6.083333
24     6.750000
25     6.583333
26     6.916667
27     6.833333
28     6.250000
29     6.833333
         ...   
498    6.000000
499    6.166667
500    6.000000
501    6.916667
502    7.083333
503    6.500000
504    6.250000
505    5.750000
506    5.750000
507    6.500000
508    6.500000
509    6.500000
510    6.833333
511    6.583333
512    6.250000
513    6.666667
514    6.916667
515    6.750000
516    6.750000
517    6.583333
518    6.750000
519    6.416667
520    6.250000
521    6.416667
522    6.916667
523    6.833333
524    6.583333
525    6.250000
526    6.750000
527    6.333333
Name: Ht (In.), dtype: float64
# Can we get statistics on those?
feet.describe()
count    528.000000
mean       6.593277
std        0.285957
min        5.750000
25%        6.416667
50%        6.666667
75%        6.833333
max        7.250000
Name: Ht (In.), dtype: float64
# Let's look at our original data again
df.head(2)
Name Age Team POS # 2013 $ Ht (In.) WT EXP 1st Year DOB School City State (Province, Territory, Etc..) Country Race HS Only
0 Gee, Alonzo 26 Cavaliers F 33 $3,250,000 78 219 4 2009 5/29/1987 Alabama Riviera Beach, FL Florida US Black No
1 Wallace, Gerald 31 Celtics F 45 $10,105,855 79 220 12 2001 7/23/1982 Alabama Sylacauga, AL Alabama US Black No

Okay that was nice but unfortunately we can’t do anything with it. It’s just sitting there, separate from our data. If this were normal code we could do blahblah['feet'] = blahblah['Ht (In.)'] / 12, but since this is pandas, we can’t. Right? Right?

# Store a new column
df['feet'] = df['Ht (In.)'] / 12
df.head()
Name Age Team POS # 2013 $ Ht (In.) WT EXP 1st Year DOB School City State (Province, Territory, Etc..) Country Race HS Only feet
0 Gee, Alonzo 26 Cavaliers F 33 $3,250,000 78 219 4 2009 5/29/1987 Alabama Riviera Beach, FL Florida US Black No 6.500000
1 Wallace, Gerald 31 Celtics F 45 $10,105,855 79 220 12 2001 7/23/1982 Alabama Sylacauga, AL Alabama US Black No 6.583333
2 Williams, Mo 30 Trail Blazers G 25 $2,652,000 73 195 10 2003 12/19/1982 Alabama Jackson, MS Mississippi US Black No 6.083333
3 Gladness, Mickell 27 Magic C 40 $762,195 83 220 2 2011 7/26/1986 Alabama A&M Birmingham, AL Alabama US Black No 6.916667
4 Jefferson, Richard 33 Jazz F 44 $11,046,000 79 230 12 2001 6/21/1980 Arizona Los Angeles, CA California US Black No 6.583333

That’s cool, maybe we could do the same thing with their salary? Take out the $ and the , and convert it to an integer?

# Can't just use .replace
df['2013 $'].head().replace("$","")
0     $3,250,000
1    $10,105,855
2     $2,652,000
3       $762,195
4    $11,046,000
Name: 2013 $, dtype: object
# Need to use this weird .str thing
df['2013 $'].head().str.replace("$","")
0     3,250,000
1    10,105,855
2     2,652,000
3       762,195
4    11,046,000
Name: 2013 $, dtype: object
# Can't just immediately replace the , either
df['2013 $'].head().str.replace("$","").replace(",","")
0     3,250,000
1    10,105,855
2     2,652,000
3       762,195
4    11,046,000
Name: 2013 $, dtype: object
# Need to use the .str thing before EVERY string method
df['2013 $'].head().str.replace("$","").str.replace(",","")
0     3250000
1    10105855
2     2652000
3      762195
4    11046000
Name: 2013 $, dtype: object
# Describe still doesn't work.
df['2013 $'].head().str.replace("$","").str.replace(",","").describe()
count           5
unique          5
top       2652000
freq            1
Name: 2013 $, dtype: object
# Let's convert it to an integer using .astype(int) before we describe it
df['2013 $'].head().str.replace("$","").str.replace(",","").astype(int).describe()
count    5.000000e+00
mean     5.563210e+06
std      4.679007e+06
min      7.621950e+05
25%      2.652000e+06
50%      3.250000e+06
75%      1.010586e+07
max      1.104600e+07
Name: 2013 $, dtype: float64
df['2013 $'].head().str.replace("$","").str.replace(",","").astype(int)
0     3250000
1    10105855
2     2652000
3      762195
4    11046000
Name: 2013 $, dtype: int64
# Maybe we can just make them millions?
df['2013 $'].head().str.replace("$","").str.replace(",","").astype(int) / 1000000
0     3.250000
1    10.105855
2     2.652000
3     0.762195
4    11.046000
Name: 2013 $, dtype: float64
# Unfortunately one is "n/a" which is going to break our code, so we can make n/a be 0
df['2013 $'].str.replace("$","").str.replace(",","").str.replace("n/a", "0").astype(int) / 1000000
0       3.250000
1      10.105855
2       2.652000
3       0.762195
4      11.046000
5       1.246680
6       5.000000
7       5.016960
8       3.563600
9       6.500000
10      3.135000
11      5.625313
12      0.000000
13     12.868632
14      0.000000
15      0.792377
16      1.750000
17     13.701250
18      0.510000
19      0.884293
20      0.788872
21     21.466718
22      1.186459
23      0.884293
24      0.788872
25      0.788872
26      1.082520
27      4.469548
28      0.490180
29      1.399507
         ...    
498    18.668431
499     8.000000
500     0.951463
501    10.361446
502     6.500000
503     1.160040
504     0.000000
505     0.884293
506     2.016000
507     2.678640
508     0.225479
509     0.788872
510     0.788872
511     2.317920
512     3.202920
513     0.854389
514     0.000000
515     0.884293
516     6.500000
517     6.500000
518     1.660257
519     5.225000
520     0.490180
521     0.490180
522     2.676000
523     0.900000
524     0.788872
525     0.854389
526    12.000000
527     2.162419
Name: 2013 $, dtype: float64
# Remove the .head() piece and save it back into the dataframe
df['millions'] = df['2013 $'].str.replace("$","").str.replace(",","").str.replace("n/a","0").astype(int) / 1000000
df.head()
Name Age Team POS # 2013 $ Ht (In.) WT EXP 1st Year DOB School City State (Province, Territory, Etc..) Country Race HS Only feet millions
0 Gee, Alonzo 26 Cavaliers F 33 $3,250,000 78 219 4 2009 5/29/1987 Alabama Riviera Beach, FL Florida US Black No 6.500000 3.250000
1 Wallace, Gerald 31 Celtics F 45 $10,105,855 79 220 12 2001 7/23/1982 Alabama Sylacauga, AL Alabama US Black No 6.583333 10.105855
2 Williams, Mo 30 Trail Blazers G 25 $2,652,000 73 195 10 2003 12/19/1982 Alabama Jackson, MS Mississippi US Black No 6.083333 2.652000
3 Gladness, Mickell 27 Magic C 40 $762,195 83 220 2 2011 7/26/1986 Alabama A&M Birmingham, AL Alabama US Black No 6.916667 0.762195
4 Jefferson, Richard 33 Jazz F 44 $11,046,000 79 230 12 2001 6/21/1980 Arizona Los Angeles, CA California US Black No 6.583333 11.046000
df.describe()
Age Ht (In.) WT EXP 1st Year feet millions
count 528.000000 528.000000 528.000000 528.000000 528.000000 528.000000 528.000000
mean 26.242424 79.119318 221.206439 4.772727 2008.227273 6.593277 3.818379
std 4.178868 3.431488 27.943169 4.325628 4.325628 0.285957 4.728437
min 18.000000 69.000000 20.000000 0.000000 1995.000000 5.750000 0.000000
25% 23.000000 77.000000 200.000000 1.000000 2005.000000 6.416667 0.816844
50% 25.000000 80.000000 220.000000 4.000000 2009.000000 6.666667 1.711620
75% 29.000000 82.000000 240.000000 8.000000 2012.000000 6.833333 5.000000
max 39.000000 87.000000 290.000000 18.000000 2013.000000 7.250000 30.453805

The average basketball player makes 3.8 million dollars and is a little over six and a half feet tall.

But who cares about those guys? I don’t care about those guys. They’re boring. I want the real rich guys!

Sorting and sub-selecting

# This is just the first few guys in the dataset. Can we order it?
df.head(3)
Name Age Team POS # 2013 $ Ht (In.) WT EXP 1st Year DOB School City State (Province, Territory, Etc..) Country Race HS Only feet millions
0 Gee, Alonzo 26 Cavaliers F 33 $3,250,000 78 219 4 2009 5/29/1987 Alabama Riviera Beach, FL Florida US Black No 6.500000 3.250000
1 Wallace, Gerald 31 Celtics F 45 $10,105,855 79 220 12 2001 7/23/1982 Alabama Sylacauga, AL Alabama US Black No 6.583333 10.105855
2 Williams, Mo 30 Trail Blazers G 25 $2,652,000 73 195 10 2003 12/19/1982 Alabama Jackson, MS Mississippi US Black No 6.083333 2.652000
# Let's try to sort them
df.sort_values(by='millions').head(3)
Name Age Team POS # 2013 $ Ht (In.) WT EXP 1st Year DOB School City State (Province, Territory, Etc..) Country Race HS Only feet millions
496 Johnson, James 26 Hawks F 13 n/a 81 248 4 2009 2/20/1987 Wake Forest Cheyene, WY Wyoming US Black No 6.750000 0.0
33 Davies, Brandon 22 Clippers F 23 n/a 81 235 0 2013 7/25/1991 Brigham Young Provo, UT Utah US Black No 6.750000 0.0
465 Drew, Larry 23 Heat G 0 n/a 74 180 0 2013 3/5/1990 UCLA Encino, CA California US Black No 6.166667 0.0

Those guys are making nothing! If only there were a way to sort from high to low, a.k.a. descending instead of ascending.

# It isn't descending = True, unfortunately
df.sort_values(by='millions', ascending=False).head(3)
Name Age Team POS # 2013 $ Ht (In.) WT EXP 1st Year DOB School City State (Province, Territory, Etc..) Country Race HS Only feet millions
203 Bryant, Kobe 35 Lakers G 24 $30,453,805 78 205 7 2006 8/23/1978 Lower Merion HS (PA) Philadelphia, PA Pennsylvania US Black Yes 6.500000 30.453805
282 Nowitzki, Dirk 35 Mavericks F 41 $22,721,381 84 245 15 1998 6/19/1978 n/a Wurzburg, BA Bavaria Germany White No 7.000000 22.721381
68 Stoudemire, Amar'e† 30 Knicks F/C 1 $21,679,893 83 245 11 2002 11/16/1982 Cypress Creek HS (FL) Lake Wales, FL Florida US Black Yes 6.916667 21.679893
# We can use this to find the oldest guys in the league
df.sort_values(by='Age', ascending=False).head(3)
Name Age Team POS # 2013 $ Ht (In.) WT EXP 1st Year DOB School City State (Province, Territory, Etc..) Country Race HS Only feet millions
392 Nash, Steve 39 Lakers G 10 $9,300,500 75 178 7 2006 2/7/1974 Santa Clara Johannesburg, SA n/a South Africa White No 6.250000 9.300500
225 Camby, Marcus 39 Rockets F/C 21 $884,293 83 240 17 1996 3/22/1974 Massachusetts Hartford, CT Connecticut US Black No 6.916667 0.884293
23 Fisher, Derek 39 Thunder G 6 $884,293 73 210 17 1996 8/9/1974 Arkansas-Little Rock Little Rock, AR Arkansas US Black No 6.083333 0.884293
# Or the youngest, by taking out 'ascending=False'
df.sort_values(by='Age').head(3)
Name Age Team POS # 2013 $ Ht (In.) WT EXP 1st Year DOB School City State (Province, Territory, Etc..) Country Race HS Only feet millions
285 Antetokounmpo, Giannis 18 Bucks G/F 34 $1,792,560 81 205 1 2012 12/16/1994 n/a Athens n/a Greece Black No 6.750000 1.79256
174 Noel, Nerlens 19 76ers C 4 $3,171,320 83 228 0 2013 4/10/1994 Kentucky Malden, MA Massachussetts US Black No 6.916667 3.17132
191 Goodwin, Archie 19 Suns G 20 $1,064,400 77 198 0 2013 8/17/1994 Kentucky Little Rock, AR Arkansas US Black No 6.416667 1.06440

But sometimes instead of just looking at them, I want to do stuff with them. Play some games with them! Dunk on them~ describe them! And we don’t want to dunk on everyone, only the players above 7 feet tall.

First, we need to check out boolean things.

# Get a big long list of True and False for every single row.
df['feet'] > 7
0      False
1      False
2      False
3      False
4      False
5      False
6      False
7      False
8      False
9      False
10     False
11     False
12     False
13     False
14     False
15     False
16     False
17     False
18     False
19     False
20     False
21     False
22     False
23     False
24     False
25     False
26     False
27     False
28     False
29     False
       ...  
498    False
499    False
500    False
501    False
502     True
503    False
504    False
505    False
506    False
507    False
508    False
509    False
510    False
511    False
512    False
513    False
514    False
515    False
516    False
517    False
518    False
519    False
520    False
521    False
522    False
523    False
524    False
525    False
526    False
527    False
Name: feet, dtype: bool
# We could use value counts if we wanted
above_seven_feet = df['feet'] > 7
above_seven_feet.value_counts()
False    518
True      10
Name: feet, dtype: int64
# But we can also apply this to every single row to say whether YES we want it or NO we don't
df['feet'].head() > 7
0    False
1    False
2    False
3    False
4    False
Name: feet, dtype: bool
# Instead of putting column names inside of the brackets, we instead
# put the True/False statements. It will only return the players above 
# seven feet tall
df[df['feet'] > 7]
Name Age Team POS # 2013 $ Ht (In.) WT EXP 1st Year DOB School City State (Province, Territory, Etc..) Country Race HS Only feet millions
54 Thabeet, Hasheem 26 Thunder C 34 $1,200,000 87 263 4 2009 2/16/1987 Connecticut Dar es Salaam n/a Tanzania Black No 7.250000 1.200000
76 Chandler, Tyson 31 Knicks C 6 $14,100,538 85 240 12 2001 10/2/1982 Dominguez HS (CA) Hanford, CA California US Black Yes 7.083333 14.100538
120 Hibbert, Roy 26 Pacers C 55 $14,283,844 86 280 5 2008 12/11/1986 Georgetown New York City, NY New York US Black No 7.166667 14.283844
145 Leonard, Meyers 21 Trail Blazers C 11 $2,222,160 85 245 1 2012 2/27/1992 Illinois Robinson, IIL Illinois US White No 7.083333 2.222160
221 Len, Alex 20 Suns C 21 $3,492,720 85 255 0 2013 6/16/1993 Maryland Antratsy n/a Ukraine White No 7.083333 3.492720
274 Gobert, Rudy 21 Jazz C 27 $1,078,800 85 235 0 2013 6/26/1992 n/a Saint-Quentin Aisne France Mixed No 7.083333 1.078800
297 Mozgov, Timofey 27 Nuggets C 25 $4,400,000 85 250 3 2010 7/16/1986 n/a St. Petersburg n/a Russia White No 7.083333 4.400000
303 Gasol, Marc 28 Grizzlies C 33 $14,860,524 85 265 5 2008 1/29/1985 n/a Barcelona n/a Spain Hispanic No 7.083333 14.860524
316 Kuzmi?, Ognjen 23 Warriors C 1 $490,180 85 231 0 2013 5/16/1990 n/a Doboj n/a Yugoslavia White No 7.083333 0.490180
502 Hawes, Spencer 25 76ers C 0 $6,500,000 85 245 6 2007 4/28/1988 Washington Seattle, WA Washington US White No 7.083333 6.500000
# Or only the guards
df[df['POS'] == 'G']
Name Age Team POS # 2013 $ Ht (In.) WT EXP 1st Year DOB School City State (Province, Territory, Etc..) Country Race HS Only feet millions
2 Williams, Mo 30 Trail Blazers G 25 $2,652,000 73 195 10 2003 12/19/1982 Alabama Jackson, MS Mississippi US Black No 6.083333 2.652000
10 Bayless, Jerryd 25 Grizzlies G 7 $3,135,000 75 200 5 2008 8/20/1988 Arizona Phoenix, AZ Arizona US Black No 6.250000 3.135000
11 Terry, Jason 36 Nets G 31 $5,625,313 74 180 14 1999 9/15/1977 Arizona Seattle, WA Washington US Black No 6.166667 5.625313
12 Fogg, Kyle 23 Nuggets G 6 n/a 75 183 0 2013 1/27/1990 Arizona Brea, CA California US Black No 6.250000 0.000000
17 Harden, James 24 Rockets G 13 $13,701,250 77 220 4 2009 8/26/1989 Arizona State Los Angeles, CA California US Black No 6.416667 13.701250
19 Pargo, Jannero 33 Bobcats G 5 $884,293 73 185 11 2002 10/22/1979 Arkansas Chicago, IL Illinois US Black No 6.083333 0.884293
20 Beverley, Patrick 25 Rockets G 2 $788,872 73 185 5 2008 7/12/1988 Arkansas Chicago, IL Illinois US Black No 6.083333 0.788872
23 Fisher, Derek 39 Thunder G 6 $884,293 73 210 17 1996 8/9/1974 Arkansas-Little Rock Little Rock, AR Arkansas US Black No 6.083333 0.884293
28 Clark, Ian 22 Jazz G 21 $490,180 75 175 0 2013 3/7/1991 Belmont Memphis, TN Tennessee US Black No 6.250000 0.490180
30 Jackson, Reggie 23 Thunder G 15 $1,260,360 75 208 2 2011 4/16/1990 Boston College Pordenone n/a Italy Black No 6.250000 1.260360
34 Fredette, Jimmer 24 Kings G 7 $2,439,840 74 195 2 2011 2/25/1989 Brigham Young Glens Falls, NY New York US White No 6.166667 2.439840
35 Mack, Shelvin 23 Hawks G 8 $884,293 75 215 2 2011 4/22/1990 Butler Lexington, KY Kentucky US Black No 6.250000 0.884293
38 Crabbe, Allen 21 Trail Blazers G 23 $825,000 78 210 0 2013 4/4/1992 California Los Angeles, CA California US Black No 6.500000 0.825000
40 Taylor, Jermaine 26 Cavaliers G 8 $780,871 77 20 4 2009 12/8/1986 Central Florida Tavares, FL Florida US Black No 6.416667 0.780871
44 Stephenson, Lance 23 Pacers G 1 $1,005,000 77 228 3 2010 9/5/1990 Cincinnati New York City, NY New York US Black No 6.416667 1.005000
46 Cole, Norris 25 Heat G 30 $1,129,200 74 175 2 2011 10/13/1988 Cleveland State Dayton, OH Ohio US Black No 6.166667 1.129200
49 Burks, Alec 22 Jazz G 10 $2,202,000 78 205 2 2011 7/20/1991 Colorado Grandview, MO Missouri US Black No 6.500000 2.202000
50 Billups, Chauncey 37 Pistons G 1 $2,500,000 75 210 16 1997 9/25/1976 Colorado Denver, CO Colorado US Black No 6.250000 2.500000
53 Gordon, Ben 30 Bobcats G 8 $13,200,000 75 200 9 2004 4/4/1983 Connecticut London, ENG n/a England Black No 6.250000 13.200000
62 Walker, Kemba 23 Bobcats G 15 $2,568,360 73 184 2 2011 5/8/1990 Connecticut New York City, NY New York US Black No 6.083333 2.568360
63 Allen, Ray 38 Heat G 34 $3,229,050 77 205 17 1996 7/20/1975 Connecticut Merced, CA California US Black No 6.416667 3.229050
64 Price, A.J. 27 Timberwolves G 22 n/a 74 185 4 2009 10/7/1986 Connecticut Orange, NJ New Jersey Us Black No 6.166667 0.000000
69 Curry, Stephen 25 Warriors G 30 $9,887,640 75 185 4 2009 3/14/1988 Davidson Akron, OH Ohio US Mixed No 6.250000 9.887640
71 Roberts, Brian 27 Pelicans G 22 $788,872 73 173 1 2012 12/3/1985 Dayton Toledo, OH Ohio US Black No 6.083333 0.788872
74 Green, Willie 32 Clippers G 34 $1,399,507 75 201 10 2003 7/28/1981 Detroit Detroit, MI Michigan US Black No 6.250000 1.399507
75 McCallum, Ray 22 Kings G 3 $524,616 75 190 0 2013 6/12/1991 Detroit Detroit, MI Michigan US Black No 6.250000 0.524616
77 Irving, Kyrie 21 Cavaliers G 2 $5,607,240 75 191 2 2011 3/23/1992 Duke Melbourne Victoria Australia Black No 6.250000 5.607240
88 Redick, J. J. 29 Clippers G 4 $6,500,000 76 190 7 2006 6/24/1984 Duke Cookeville, TN Tennessee US White No 6.333333 6.500000
89 Rivers, Austin 21 Pelicans G 25 $2,339,040 76 200 1 2012 8/1/1992 Duke Santa Monica, CA California US Mixed No 6.333333 2.339040
90 Curry, Seth 23 Warriors G 3 $490,180 74 185 0 2013 8/23/1990 Duke Charlotte, NC North Carolina US Black No 6.166667 0.490180
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
457 Johnson, Orlando 24 Pacers G 11 $788,872 77 220 1 2012 3/11/1989 UC Santa Barbara Monterey, CA California US Black No 6.416667 0.788872
464 Collison, Darren 26 Clippers G 2 $1,900,000 72 175 4 2009 8/23/1987 UCLA Rancho Cucamonga, CA California US Black No 6.000000 1.900000
465 Drew, Larry 23 Heat G 0 n/a 74 180 0 2013 3/5/1990 UCLA Encino, CA California US Black No 6.166667 0.000000
466 Farmar, Jordan 26 Lakers G 1 $884,293 74 180 12 2001 11/30/1986 UCLA Los Angeles, CA California US Mixed No 6.166667 0.884293
467 Holiday, Jrue 23 Pelicans G 11 $9,713,484 76 205 4 2009 6/12/1990 UCLA Chatsworth, CA California US Black No 6.333333 9.713484
468 Lee, Malcolm 23 Suns G 30 $884,293 77 200 2 2011 5/22/1990 UCLA Riverside, CA California US Black No 6.416667 0.884293
469 Westbrook, Russell 24 Thunder G 0 $14,693,906 75 187 5 2008 11/12/1988 UCLA Long Beach, CA California US Black No 6.250000 14.693906
470 Watson, Earl 34 Trail Blazers G 17 $884,293 73 199 12 2001 6/12/1979 UCLA Kansas City, KA Kansas US Black No 6.083333 0.884293
480 Miller, Andre 37 Nuggets G 24 $5,000,000 74 200 14 1999 3/19/1976 Utah Los Angeles, CA California US Black No 6.166667 5.000000
481 Price, Ronnie 30 Magic G 10 $1,146,337 74 190 8 2005 6/21/1983 Utah Valley Friendswood, Texas Texas US Black No 6.166667 1.146337
485 Jenkins, John 22 Hawks G 12 $1,258,800 76 215 1 2012 3/6/1991 Vanderbilt Hendersonville, TN Tennessee US Black No 6.333333 1.258800
487 Wayns, Maalik 22 Clippers G 5 $788,872 73 195 1 2012 5/2/1991 Villanova Philadelphia, PA Pennsylvania US Black No 6.083333 0.788872
488 Foye, Randy 30 Nuggets G 4 $3,000,000 76 213 7 2006 9/24/1983 Villanova Newark, NJ New Jersey US Black No 6.333333 3.000000
489 Lowry, Kyle 27 Raptors G 7 $6,210,000 72 205 7 2006 3/25/1986 Villanova Philadelphia, PA Pennsylvania US Black No 6.000000 6.210000
491 Mason, Jr., Roger 33 Heat G 21 $854,389 77 205 11 2002 9/10/1980 Virginia Washington, DC DC US Black No 6.416667 0.854389
493 Daniels, Troy 22 Bobcats G 30 n/a 76 200 0 2013 7/15/1991 Virginia Commonwealth Roanoke, VA Virginia US Black No 6.333333 0.000000
494 Maynor, Eric 26 Wizards G 6 $13,000,000 75 175 4 2009 6/11/1987 Virginia Commonwealth Raeford, NC North Carolina US Black No 6.250000 13.000000
498 Paul, Chris 28 Clippers G 3 $18,668,431 72 175 8 2005 5/6/1985 Wake Forest Forsyth County, NC North Carolina US Black No 6.000000 18.668431
499 Teague, Jeff 25 Hawks G 0 $8,000,000 74 181 4 2009 6/10/1988 Wake Forest Indianapolis, IN Indiana US Black No 6.166667 8.000000
500 Smith, Ish 25 Suns G 30 $951,463 72 175 3 2010 7/5/1988 Wake Forest Charlotte, NC North Carolina US Black No 6.000000 0.951463
503 Wroten, Tony 20 76ers G 8 $1,160,040 78 205 1 2012 4/13/1993 Washington Renton, WA Washington US Black No 6.500000 1.160040
504 Gaddy, Abdul 21 Bobcats G 10 n/a 75 185 0 2013 1/26/1992 Washington Tacoma, WA Washington US Black No 6.250000 0.000000
505 Thomas, Isaiah 24 Kings G 22 $884,293 69 185 2 2011 2/7/1989 Washington Tacoma, WA Washington US Black No 5.750000 0.884293
506 Robinson, Nate 29 Nuggets G 10 $2,016,000 69 180 8 2005 5/31/1984 Washington Seattle, WA Washington US Black No 5.750000 2.016000
507 Ross, Terrence 22 Raptors G 31 $2,678,640 78 195 1 2012 2/5/1991 Washington Portland, OR Oregon US Black No 6.500000 2.678640
512 Lillard, Damian 23 Trail Blazers G 0 $3,202,920 75 195 1 2012 7/15/1990 Weber State Oakland, CA California US Black No 6.250000 3.202920
517 Martin, Kevin 30 Timberwolves G 23 $6,500,000 79 185 9 2004 2/1/1983 Western Carolina Zanesville, OH Ohio US Mixed No 6.583333 6.500000
520 Mekel, Gal 25 Mavericks G 33 $490,180 75 191 5 2008 3/4/1988 Wichita State Petah Tikva n/a Israel White No 6.250000 0.490180
525 Harris, Devin 30 Mavericks G 20 $854,389 75 192 9 2004 2/27/1983 Wisconsin Milwaukee, WI Wisconsin US Black No 6.250000 0.854389
527 Crawford, Jordan 24 Celtics G 27 $2,162,419 76 195 3 2010 10/23/1988 Xavier Detroit, MI Michigan US Black No 6.333333 2.162419

175 rows × 19 columns

# Or only the guards who make more than 15 million
df[(df['POS'] == 'G') & (df['millions'] > 15)]
Name Age Team POS # 2013 $ Ht (In.) WT EXP 1st Year DOB School City State (Province, Territory, Etc..) Country Race HS Only feet millions
147 Williams, Deron 29 Nets G 8 $18,466,130 75 209 8 2005 6/26/1984 Illinois Parkersburg, WV West Virginia US Black No 6.250000 18.466130
203 Bryant, Kobe 35 Lakers G 24 $30,453,805 78 205 7 2006 8/23/1978 Lower Merion HS (PA) Philadelphia, PA Pennsylvania US Black Yes 6.500000 30.453805
214 Wade, Dwyane 31 Heat G 3 $18,673,000 76 220 10 2003 1/17/1982 Marquette Chicago, IL Illinois US Black No 6.333333 18.673000
227 Rose, Derrick 25 Bulls G 1 $17,632,688 75 190 5 2008 10/4/1988 Memphis Chicago, IL Illinois US Black No 6.250000 17.632688
498 Paul, Chris 28 Clippers G 3 $18,668,431 72 175 8 2005 5/6/1985 Wake Forest Forsyth County, NC North Carolina US Black No 6.000000 18.668431
# It might be easier to break down the booleans into separate variables
is_guard = df['POS'] == 'G'
more_than_fifteen_million = df['millions'] > 15
df[is_guard & more_than_fifteen_million]
Name Age Team POS # 2013 $ Ht (In.) WT EXP 1st Year DOB School City State (Province, Territory, Etc..) Country Race HS Only feet millions
147 Williams, Deron 29 Nets G 8 $18,466,130 75 209 8 2005 6/26/1984 Illinois Parkersburg, WV West Virginia US Black No 6.250000 18.466130
203 Bryant, Kobe 35 Lakers G 24 $30,453,805 78 205 7 2006 8/23/1978 Lower Merion HS (PA) Philadelphia, PA Pennsylvania US Black Yes 6.500000 30.453805
214 Wade, Dwyane 31 Heat G 3 $18,673,000 76 220 10 2003 1/17/1982 Marquette Chicago, IL Illinois US Black No 6.333333 18.673000
227 Rose, Derrick 25 Bulls G 1 $17,632,688 75 190 5 2008 10/4/1988 Memphis Chicago, IL Illinois US Black No 6.250000 17.632688
498 Paul, Chris 28 Clippers G 3 $18,668,431 72 175 8 2005 5/6/1985 Wake Forest Forsyth County, NC North Carolina US Black No 6.000000 18.668431
# We can save this stuff
short_players = df[df['feet'] < 6.5]
short_players
Name Age Team POS # 2013 $ Ht (In.) WT EXP 1st Year DOB School City State (Province, Territory, Etc..) Country Race HS Only feet millions
2 Williams, Mo 30 Trail Blazers G 25 $2,652,000 73 195 10 2003 12/19/1982 Alabama Jackson, MS Mississippi US Black No 6.083333 2.652000
10 Bayless, Jerryd 25 Grizzlies G 7 $3,135,000 75 200 5 2008 8/20/1988 Arizona Phoenix, AZ Arizona US Black No 6.250000 3.135000
11 Terry, Jason 36 Nets G 31 $5,625,313 74 180 14 1999 9/15/1977 Arizona Seattle, WA Washington US Black No 6.166667 5.625313
12 Fogg, Kyle 23 Nuggets G 6 n/a 75 183 0 2013 1/27/1990 Arizona Brea, CA California US Black No 6.250000 0.000000
17 Harden, James 24 Rockets G 13 $13,701,250 77 220 4 2009 8/26/1989 Arizona State Los Angeles, CA California US Black No 6.416667 13.701250
19 Pargo, Jannero 33 Bobcats G 5 $884,293 73 185 11 2002 10/22/1979 Arkansas Chicago, IL Illinois US Black No 6.083333 0.884293
20 Beverley, Patrick 25 Rockets G 2 $788,872 73 185 5 2008 7/12/1988 Arkansas Chicago, IL Illinois US Black No 6.083333 0.788872
23 Fisher, Derek 39 Thunder G 6 $884,293 73 210 17 1996 8/9/1974 Arkansas-Little Rock Little Rock, AR Arkansas US Black No 6.083333 0.884293
28 Clark, Ian 22 Jazz G 21 $490,180 75 175 0 2013 3/7/1991 Belmont Memphis, TN Tennessee US Black No 6.250000 0.490180
30 Jackson, Reggie 23 Thunder G 15 $1,260,360 75 208 2 2011 4/16/1990 Boston College Pordenone n/a Italy Black No 6.250000 1.260360
34 Fredette, Jimmer 24 Kings G 7 $2,439,840 74 195 2 2011 2/25/1989 Brigham Young Glens Falls, NY New York US White No 6.166667 2.439840
35 Mack, Shelvin 23 Hawks G 8 $884,293 75 215 2 2011 4/22/1990 Butler Lexington, KY Kentucky US Black No 6.250000 0.884293
40 Taylor, Jermaine 26 Cavaliers G 8 $780,871 77 20 4 2009 12/8/1986 Central Florida Tavares, FL Florida US Black No 6.416667 0.780871
44 Stephenson, Lance 23 Pacers G 1 $1,005,000 77 228 3 2010 9/5/1990 Cincinnati New York City, NY New York US Black No 6.416667 1.005000
46 Cole, Norris 25 Heat G 30 $1,129,200 74 175 2 2011 10/13/1988 Cleveland State Dayton, OH Ohio US Black No 6.166667 1.129200
50 Billups, Chauncey 37 Pistons G 1 $2,500,000 75 210 16 1997 9/25/1976 Colorado Denver, CO Colorado US Black No 6.250000 2.500000
53 Gordon, Ben 30 Bobcats G 8 $13,200,000 75 200 9 2004 4/4/1983 Connecticut London, ENG n/a England Black No 6.250000 13.200000
62 Walker, Kemba 23 Bobcats G 15 $2,568,360 73 184 2 2011 5/8/1990 Connecticut New York City, NY New York US Black No 6.083333 2.568360
63 Allen, Ray 38 Heat G 34 $3,229,050 77 205 17 1996 7/20/1975 Connecticut Merced, CA California US Black No 6.416667 3.229050
64 Price, A.J. 27 Timberwolves G 22 n/a 74 185 4 2009 10/7/1986 Connecticut Orange, NJ New Jersey Us Black No 6.166667 0.000000
65 Lamb, Jeremy 21 Thunder G/F 11 $2,111,160 77 180 1 2012 5/30/1992 Connecticut Norcross, GA Georgia US Black No 6.416667 2.111160
69 Curry, Stephen 25 Warriors G 30 $9,887,640 75 185 4 2009 3/14/1988 Davidson Akron, OH Ohio US Mixed No 6.250000 9.887640
71 Roberts, Brian 27 Pelicans G 22 $788,872 73 173 1 2012 12/3/1985 Dayton Toledo, OH Ohio US Black No 6.083333 0.788872
74 Green, Willie 32 Clippers G 34 $1,399,507 75 201 10 2003 7/28/1981 Detroit Detroit, MI Michigan US Black No 6.250000 1.399507
75 McCallum, Ray 22 Kings G 3 $524,616 75 190 0 2013 6/12/1991 Detroit Detroit, MI Michigan US Black No 6.250000 0.524616
77 Irving, Kyrie 21 Cavaliers G 2 $5,607,240 75 191 2 2011 3/23/1992 Duke Melbourne Victoria Australia Black No 6.250000 5.607240
88 Redick, J. J. 29 Clippers G 4 $6,500,000 76 190 7 2006 6/24/1984 Duke Cookeville, TN Tennessee US White No 6.333333 6.500000
89 Rivers, Austin 21 Pelicans G 25 $2,339,040 76 200 1 2012 8/1/1992 Duke Santa Monica, CA California US Mixed No 6.333333 2.339040
90 Curry, Seth 23 Warriors G 3 $490,180 74 185 0 2013 8/23/1990 Duke Charlotte, NC North Carolina US Black No 6.166667 0.490180
91 Henderson, Gerald 25 Bobcats G/F 9 $6,000,000 77 215 4 2009 12/9/1987 Duke Caldwell, NJ New Jersey US Black No 6.416667 6.000000
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
464 Collison, Darren 26 Clippers G 2 $1,900,000 72 175 4 2009 8/23/1987 UCLA Rancho Cucamonga, CA California US Black No 6.000000 1.900000
465 Drew, Larry 23 Heat G 0 n/a 74 180 0 2013 3/5/1990 UCLA Encino, CA California US Black No 6.166667 0.000000
466 Farmar, Jordan 26 Lakers G 1 $884,293 74 180 12 2001 11/30/1986 UCLA Los Angeles, CA California US Mixed No 6.166667 0.884293
467 Holiday, Jrue 23 Pelicans G 11 $9,713,484 76 205 4 2009 6/12/1990 UCLA Chatsworth, CA California US Black No 6.333333 9.713484
468 Lee, Malcolm 23 Suns G 30 $884,293 77 200 2 2011 5/22/1990 UCLA Riverside, CA California US Black No 6.416667 0.884293
469 Westbrook, Russell 24 Thunder G 0 $14,693,906 75 187 5 2008 11/12/1988 UCLA Long Beach, CA California US Black No 6.250000 14.693906
470 Watson, Earl 34 Trail Blazers G 17 $884,293 73 199 12 2001 6/12/1979 UCLA Kansas City, KA Kansas US Black No 6.083333 0.884293
471 Afflalo, Arron 27 Magic G/F 4 $7,750,000 77 215 6 2007 10/15/1985 UCLA Los Angeles, CA California US Black No 6.416667 7.750000
480 Miller, Andre 37 Nuggets G 24 $5,000,000 74 200 14 1999 3/19/1976 Utah Los Angeles, CA California US Black No 6.166667 5.000000
481 Price, Ronnie 30 Magic G 10 $1,146,337 74 190 8 2005 6/21/1983 Utah Valley Friendswood, Texas Texas US Black No 6.166667 1.146337
482 Howard, Ron 31 Pacers G/F 19 n/a 77 200 0 2013 1/14/1982 Valparaiso Chicago, IL Illinois US Black No 6.416667 0.000000
485 Jenkins, John 22 Hawks G 12 $1,258,800 76 215 1 2012 3/6/1991 Vanderbilt Hendersonville, TN Tennessee US Black No 6.333333 1.258800
487 Wayns, Maalik 22 Clippers G 5 $788,872 73 195 1 2012 5/2/1991 Villanova Philadelphia, PA Pennsylvania US Black No 6.083333 0.788872
488 Foye, Randy 30 Nuggets G 4 $3,000,000 76 213 7 2006 9/24/1983 Villanova Newark, NJ New Jersey US Black No 6.333333 3.000000
489 Lowry, Kyle 27 Raptors G 7 $6,210,000 72 205 7 2006 3/25/1986 Villanova Philadelphia, PA Pennsylvania US Black No 6.000000 6.210000
491 Mason, Jr., Roger 33 Heat G 21 $854,389 77 205 11 2002 9/10/1980 Virginia Washington, DC DC US Black No 6.416667 0.854389
493 Daniels, Troy 22 Bobcats G 30 n/a 76 200 0 2013 7/15/1991 Virginia Commonwealth Roanoke, VA Virginia US Black No 6.333333 0.000000
494 Maynor, Eric 26 Wizards G 6 $13,000,000 75 175 4 2009 6/11/1987 Virginia Commonwealth Raeford, NC North Carolina US Black No 6.250000 13.000000
498 Paul, Chris 28 Clippers G 3 $18,668,431 72 175 8 2005 5/6/1985 Wake Forest Forsyth County, NC North Carolina US Black No 6.000000 18.668431
499 Teague, Jeff 25 Hawks G 0 $8,000,000 74 181 4 2009 6/10/1988 Wake Forest Indianapolis, IN Indiana US Black No 6.166667 8.000000
500 Smith, Ish 25 Suns G 30 $951,463 72 175 3 2010 7/5/1988 Wake Forest Charlotte, NC North Carolina US Black No 6.000000 0.951463
504 Gaddy, Abdul 21 Bobcats G 10 n/a 75 185 0 2013 1/26/1992 Washington Tacoma, WA Washington US Black No 6.250000 0.000000
505 Thomas, Isaiah 24 Kings G 22 $884,293 69 185 2 2011 2/7/1989 Washington Tacoma, WA Washington US Black No 5.750000 0.884293
506 Robinson, Nate 29 Nuggets G 10 $2,016,000 69 180 8 2005 5/31/1984 Washington Seattle, WA Washington US Black No 5.750000 2.016000
512 Lillard, Damian 23 Trail Blazers G 0 $3,202,920 75 195 1 2012 7/15/1990 Weber State Oakland, CA California US Black No 6.250000 3.202920
519 Lee, Courtney 28 Celtics G/F 11 $5,225,000 77 200 5 2008 10/3/1985 Western Kentucky Indianapolis, IN Indiana US Black No 6.416667 5.225000
520 Mekel, Gal 25 Mavericks G 33 $490,180 75 191 5 2008 3/4/1988 Wichita State Petah Tikva n/a Israel White No 6.250000 0.490180
521 Murry, Toure' 23 Knicks G/F 23 $490,180 77 195 0 2013 11/8/1989 Wichita State Houston, TX Texas US Black No 6.416667 0.490180
525 Harris, Devin 30 Mavericks G 20 $854,389 75 192 9 2004 2/27/1983 Wisconsin Milwaukee, WI Wisconsin US Black No 6.250000 0.854389
527 Crawford, Jordan 24 Celtics G 27 $2,162,419 76 195 3 2010 10/23/1988 Xavier Detroit, MI Michigan US Black No 6.333333 2.162419

166 rows × 19 columns

short_players.describe()
Age Ht (In.) WT EXP 1st Year feet millions
count 166.000000 166.000000 166.000000 166.000000 166.000000 166.000000 166.000000
mean 25.933735 74.909639 193.530120 4.168675 2008.831325 6.242470 3.423839
std 4.286887 1.778056 19.085668 4.059614 4.059614 0.148171 4.122675
min 19.000000 69.000000 20.000000 0.000000 1996.000000 5.750000 0.000000
25% 23.000000 74.000000 185.000000 1.000000 2006.000000 6.166667 0.788872
50% 25.000000 75.000000 195.000000 3.000000 2010.000000 6.250000 1.595675
75% 28.000000 76.000000 205.000000 7.000000 2012.000000 6.333333 4.940940
max 39.000000 77.000000 228.000000 17.000000 2013.000000 6.416667 18.673000
# Maybe we can compare them to taller players?
df[df['feet'] >= 6.5].describe()
Age Ht (In.) WT EXP 1st Year feet millions
count 362.000000 362.000000 362.000000 362.000000 362.000000 362.000000 362.000000
mean 26.383978 81.049724 233.897790 5.049724 2007.950276 6.754144 3.999301
std 4.126674 1.964438 21.439163 4.420146 4.420146 0.163703 4.976573
min 18.000000 78.000000 155.000000 0.000000 1995.000000 6.500000 0.000000
25% 23.000000 79.000000 220.000000 1.000000 2005.000000 6.583333 0.854389
50% 26.000000 81.000000 235.000000 4.000000 2009.000000 6.750000 1.750000
75% 29.000000 83.000000 250.000000 8.000000 2012.000000 6.916667 5.012720
max 39.000000 87.000000 290.000000 18.000000 2013.000000 7.250000 30.453805

Drawing pictures

Okay okay enough code and enough stupid numbers. I’m visual. I want graphics. Okay????? Okay.

df['Age'].head()
0    26
1    31
2    30
3    27
4    33
Name: Age, dtype: int64
# This will scream we don't have matplotlib.
df['Age'].hist()
---------------------------------------------------------------------------

ImportError                               Traceback (most recent call last)

<ipython-input-124-694adadca099> in <module>()
----> 1 df['Age'].hist()


/Users/soma/.virtualenvs/pandas-intro/lib/python3.4/site-packages/pandas/tools/plotting.py in hist_series(self, by, ax, grid, xlabelsize, xrot, ylabelsize, yrot, figsize, bins, **kwds)
   2941 
   2942     """
-> 2943     import matplotlib.pyplot as plt
   2944 
   2945     if by is None:


ImportError: No module named 'matplotlib'

matplotlib is a graphing library. It’s the Python way to make graphs!

!pip install matplotlib
Collecting matplotlib
  Using cached matplotlib-1.5.1-cp34-cp34m-macosx_10_6_intel.macosx_10_9_intel.macosx_10_9_x86_64.macosx_10_10_intel.macosx_10_10_x86_64.whl
Collecting cycler (from matplotlib)
  Using cached cycler-0.10.0-py2.py3-none-any.whl
Requirement already satisfied (use --upgrade to upgrade): numpy>=1.6 in /Users/soma/.virtualenvs/pandas-intro/lib/python3.4/site-packages (from matplotlib)
Requirement already satisfied (use --upgrade to upgrade): pytz in /Users/soma/.virtualenvs/pandas-intro/lib/python3.4/site-packages (from matplotlib)
Requirement already satisfied (use --upgrade to upgrade): python-dateutil in /Users/soma/.virtualenvs/pandas-intro/lib/python3.4/site-packages (from matplotlib)
Collecting pyparsing!=2.0.0,!=2.0.4,>=1.5.6 (from matplotlib)
  Using cached pyparsing-2.1.4-py2.py3-none-any.whl
Requirement already satisfied (use --upgrade to upgrade): six in /Users/soma/.virtualenvs/pandas-intro/lib/python3.4/site-packages (from cycler->matplotlib)
Installing collected packages: cycler, pyparsing, matplotlib
Successfully installed cycler-0.10.0 matplotlib-1.5.1 pyparsing-2.1.4
# this will open up a weird window that won't do anything
df['Age'].hist()
<matplotlib.axes._subplots.AxesSubplot at 0x108d38780>
# So instead you run this code
%matplotlib inline
df['Age'].hist()
<matplotlib.axes._subplots.AxesSubplot at 0x10d724dd8>

png

But that’s ugly. There’s a thing called ggplot for R that looks nice. We want to look nice. We want to look like ggplot.

import matplotlib.pyplot as plt
plt.style.available
['grayscale',
 'seaborn-muted',
 'seaborn-paper',
 'classic',
 'seaborn-notebook',
 'seaborn-white',
 'seaborn-pastel',
 'fivethirtyeight',
 'seaborn-dark-palette',
 'seaborn-ticks',
 'seaborn-poster',
 'seaborn-talk',
 'seaborn-whitegrid',
 'seaborn-deep',
 'ggplot',
 'dark_background',
 'seaborn-bright',
 'bmh',
 'seaborn-darkgrid',
 'seaborn-dark',
 'seaborn-colorblind']
plt.style.use('ggplot')
df['Age'].hist()
<matplotlib.axes._subplots.AxesSubplot at 0x10d73f5c0>

png

plt.style.use('seaborn-deep')
df['Age'].hist()
<matplotlib.axes._subplots.AxesSubplot at 0x108e5fef0>

png

plt.style.use('fivethirtyeight')
df['Age'].hist()
<matplotlib.axes._subplots.AxesSubplot at 0x108f48978>

png

That might look better with a little more customization. So let’s customize it.

# Pass in all sorts of stuff!
# Most from http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.hist.html
# .range() is a matplotlib thing
df['Age'].hist(bins=20, xlabelsize=10, ylabelsize=10, range=(0,40))
<matplotlib.axes._subplots.AxesSubplot at 0x10e73e358>

png

I want more graphics! Do tall people make more money?!?!

df.plot(kind='scatter', x='feet', y='millions')
<matplotlib.axes._subplots.AxesSubplot at 0x110193320>

png

df.head()
Name Age Team POS # 2013 $ Ht (In.) WT EXP 1st Year DOB School City State (Province, Territory, Etc..) Country Race HS Only feet millions
0 Gee, Alonzo 26 Cavaliers F 33 $3,250,000 78 219 4 2009 5/29/1987 Alabama Riviera Beach, FL Florida US Black No 6.500000 3.250000
1 Wallace, Gerald 31 Celtics F 45 $10,105,855 79 220 12 2001 7/23/1982 Alabama Sylacauga, AL Alabama US Black No 6.583333 10.105855
2 Williams, Mo 30 Trail Blazers G 25 $2,652,000 73 195 10 2003 12/19/1982 Alabama Jackson, MS Mississippi US Black No 6.083333 2.652000
3 Gladness, Mickell 27 Magic C 40 $762,195 83 220 2 2011 7/26/1986 Alabama A&M Birmingham, AL Alabama US Black No 6.916667 0.762195
4 Jefferson, Richard 33 Jazz F 44 $11,046,000 79 230 12 2001 6/21/1980 Arizona Los Angeles, CA California US Black No 6.583333 11.046000
# How does experience relate with the amount of money they're making?
df.plot(kind='scatter', x='EXP', y='millions')
<matplotlib.axes._subplots.AxesSubplot at 0x11111e048>

png

# At least we can assume height and weight are related
df.plot(kind='scatter', x='WT', y='feet')
<matplotlib.axes._subplots.AxesSubplot at 0x110f31278>

png

# At least we can assume height and weight are related
# http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.plot.html
df.plot(kind='scatter', x='WT', y='feet', xlim=(100,300), ylim=(5.5, 8))
<matplotlib.axes._subplots.AxesSubplot at 0x1121c4208>

png

plt.style.use('ggplot')
df.plot(kind='scatter', x='WT', y='feet', xlim=(100,300), ylim=(5.5, 8))
<matplotlib.axes._subplots.AxesSubplot at 0x112755518>

png

# We can also use plt separately
# It's SIMILAR but TOTALLY DIFFERENT
centers = df[df['POS'] == 'C']
guards = df[df['POS'] == 'G']
forwards = df[df['POS'] == 'F']
plt.scatter(y=centers["feet"], x=centers["WT"], c='c', alpha=0.75, marker='x')
plt.scatter(y=guards["feet"], x=guards["WT"], c='y', alpha=0.75, marker='o')
plt.scatter(y=forwards["feet"], x=forwards["WT"], c='m', alpha=0.75, marker='v')
plt.xlim(100,300)
plt.ylim(5.5,8)
(5.5, 8)

png