import pandas as pd

What bad columns looks like

Sometimes columns have extra spaces or are just plain odd, even if they look normal.

df = pd.read_csv("../Civil_List_2014.csv").head(3)
df
DPT NAME ADDRESS TTL # PC SAL-RATE
0 868 B J SANDIFORD DEPARTMENT OF CITYWIDE ADM 12702 X $5.00
1 868 C A WIGFALL DEPARTMENT OF CITYWIDE ADM 12702 X $5.00
2 69 A E A-AWOSOGBA HRA/DEPARTMENT OF SOCIAL S 52311 A $51955.00
df.columns
Index(['DPT     ', 'NAME    ', 'ADDRESS ', 'TTL #   ', 'PC      ', 'SAL-RATE'], dtype='object')

Overwriting columns

In order to fix them, you have a few options. Once thing you can do is just overwrite them with new ones.

df = pd.read_csv("../Civil_List_2014.csv").head(3)
df
DPT NAME ADDRESS TTL # PC SAL-RATE
0 868 B J SANDIFORD DEPARTMENT OF CITYWIDE ADM 12702 X $5.00
1 868 C A WIGFALL DEPARTMENT OF CITYWIDE ADM 12702 X $5.00
2 69 A E A-AWOSOGBA HRA/DEPARTMENT OF SOCIAL S 52311 A $51955.00
df.columns
Index(['DPT     ', 'NAME    ', 'ADDRESS ', 'TTL #   ', 'PC      ', 'SAL-RATE'], dtype='object')

In this case it might make sense to use a list comprehension to strip all of the extra spaces.

df.columns = [col.strip() for col in df.columns]
df.columns
Index(['DPT', 'NAME', 'ADDRESS', 'TTL #', 'PC', 'SAL-RATE'], dtype='object')

Mass renaming

You can also just pass in a new list of columns if you don’t like what they come in as.

df = pd.read_csv("../Civil_List_2014.csv").head(3)
df
DPT NAME ADDRESS TTL # PC SAL-RATE
0 868 B J SANDIFORD DEPARTMENT OF CITYWIDE ADM 12702 X $5.00
1 868 C A WIGFALL DEPARTMENT OF CITYWIDE ADM 12702 X $5.00
2 69 A E A-AWOSOGBA HRA/DEPARTMENT OF SOCIAL S 52311 A $51955.00
# Rename all of the columns, keeping them in order
df.columns = ['Department', 'Name', 'Address', 'Title', 'Pay Class', 'Salary Rate']
df
Department Name Address Title Pay Class Salary Rate
0 868 B J SANDIFORD DEPARTMENT OF CITYWIDE ADM 12702 X $5.00
1 868 C A WIGFALL DEPARTMENT OF CITYWIDE ADM 12702 X $5.00
2 69 A E A-AWOSOGBA HRA/DEPARTMENT OF SOCIAL S 52311 A $51955.00

Rename on import

You also set their names when you’re reading in the csv.

# Read in the csv, passing names= to set the column names
df = pd.read_csv("../Civil_List_2014.csv", names=["Department", "Name", "Address", "Title", "Pay Class", "Salary Rate"]).head(3)
df
/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/site-packages/IPython/core/interactiveshell.py:2723: DtypeWarning: Columns (0) have mixed types. Specify dtype option on import or set low_memory=False.
  interactivity=interactivity, compiler=compiler, result=result)
Department Name Address Title Pay Class Salary Rate
0 DPT NAME ADDRESS TTL # PC SAL-RATE
1 868 B J SANDIFORD DEPARTMENT OF CITYWIDE ADM 12702 X $5.00
2 868 C A WIGFALL DEPARTMENT OF CITYWIDE ADM 12702 X $5.00

Unfortunately this leaves you with the first row of actual headers inside of your data. When usings names= in read_csv, add skiprows=1 to skip the first row (the header row).

# Pass in names=, but also skiprows=1
df = pd.read_csv("../Civil_List_2014.csv", skiprows=1, names=["Department", "Name", "Address", "Title", "Pay Class", "Salary Rate"]).head(3)
df
Department Name Address Title Pay Class Salary Rate
0 868 B J SANDIFORD DEPARTMENT OF CITYWIDE ADM 12702 X $5.00
1 868 C A WIGFALL DEPARTMENT OF CITYWIDE ADM 12702 X $5.00
2 69 A E A-AWOSOGBA HRA/DEPARTMENT OF SOCIAL S 52311 A $51955.00