# Read in the dataframetimes_df=pd.read_csv("times-and-serials.csv")times_df.head()
Owner
Time
Serial Number
0
Daniella
8:35 AM
7754
1
Carleen
7:46 AM
6881
2
Daron
1:35 AM
4509
3
Cherly
1:35 AM
2310
4
Manda
1:35 AM
4362
# Try out a way of parsing one of the datesdatetime.datetime.strptime("8:35 AM","%I:%M %p")
datetime.datetime(1900, 1, 1, 8, 35)
importnumpyasnp# Build a function using that methoddeftime_to_datetime(str_time):try:#print("Trying to convert", str_time, "into a time")ifstr_time=='-999':#print("It's -999")returnnp.nan#print("It's not -999")returndatetime.datetime.strptime(str_time.strip(),"%I:%M %p")except:returnnp.nan
# Apply that method to the 'Time' column of the dataframetimes_df['Time'].apply(time_to_datetime)
# Apply that method to the 'Time' column of the dataframetimes_df['converted_time']=times_df['Time'].apply(time_to_datetime)
# Let's take a peek at our new columntimes_df.head(10)
Owner
Time
Serial Number
converted_time
0
Daniella
8:35 AM
7754
1900-01-01 08:35:00
1
Carleen
7:46 AM
6881
1900-01-01 07:46:00
2
Daron
1:35 AM
4509
1900-01-01 01:35:00
3
Cherly
1:35 AM
2310
1900-01-01 01:35:00
4
Manda
1:35 AM
4362
1900-01-01 01:35:00
5
Keri
12:57 PM
3360
1900-01-01 12:57:00
6
Frank
3:49 AM
5901
1900-01-01 03:49:00
7
Berneice
-999
6995
NaT
8
Janis
12:36 AM
4788
1900-01-01 00:36:00
9
Tosha
5:19 PM
2585
1900-01-01 17:19:00
# Let's look at all of the columns where converted time# didn't end up working outtimes_df[pd.isnull(times_df['converted_time'])]
Owner
Time
Serial Number
converted_time
7
Berneice
-999
6995
NaT
15
Renato
GERTRUDE
3226
NaT
23
Monserrate
45:18 PM
5634
NaT
29
Brianne
527
0
NaT
36
Meggan
0:17 AM
5241
NaT
# don't do this it won't work# if whatever == 'NaN'# do this: np.isnull(whatever)
importnumpyasnp
nan
%pdbon
Automatic pdb calling has been turned ON
Walkthrough #4
I want to make sure my Plate ID is a string. Can’t lose the leading zeroes!
I don’t think anyone’s car was built in 0AD. Discard the ‘0’s as NaN.
I want the dates to be dates! Read the read_csv documentation to find out how
to make pandas automatically parse dates.
“Date first observed” is a pretty weird column, but it seems like it has a
date hiding inside. Using a function with .apply, transform the string (e.g.
“20140324”) into a Python date. Make the 0’s show up as NaN.
“Violation time” is… not a time. Make it a time.
There sure are a lot of colors of cars, too bad so many of them are the
same. Make “BLK” and “BLACK”, “WT” and “WHITE”, and any other combinations that
you notice.
Join the data with the Parking Violations Code dataset from the NYC Open Data
site.
read_csv documentation can be found at http://pandas.pydata.org/pandas-
docs/stable/generated/pandas.read_csv.html