What bad columns looks like
Sometimes columns have extra spaces or are just plain odd, even if they look normal.
DPT | NAME | ADDRESS | TTL # | PC | SAL-RATE | |
---|---|---|---|---|---|---|
0 | 868 | B J SANDIFORD | DEPARTMENT OF CITYWIDE ADM | 12702 | X | $5.00 |
1 | 868 | C A WIGFALL | DEPARTMENT OF CITYWIDE ADM | 12702 | X | $5.00 |
2 | 69 | A E A-AWOSOGBA | HRA/DEPARTMENT OF SOCIAL S | 52311 | A | $51955.00 |
Index(['DPT ', 'NAME ', 'ADDRESS ', 'TTL # ', 'PC ', 'SAL-RATE'], dtype='object')
Overwriting columns
In order to fix them, you have a few options. Once thing you can do is just overwrite them with new ones.
DPT | NAME | ADDRESS | TTL # | PC | SAL-RATE | |
---|---|---|---|---|---|---|
0 | 868 | B J SANDIFORD | DEPARTMENT OF CITYWIDE ADM | 12702 | X | $5.00 |
1 | 868 | C A WIGFALL | DEPARTMENT OF CITYWIDE ADM | 12702 | X | $5.00 |
2 | 69 | A E A-AWOSOGBA | HRA/DEPARTMENT OF SOCIAL S | 52311 | A | $51955.00 |
Index(['DPT ', 'NAME ', 'ADDRESS ', 'TTL # ', 'PC ', 'SAL-RATE'], dtype='object')
In this case it might make sense to use a list comprehension to strip all of the extra spaces.
Index(['DPT', 'NAME', 'ADDRESS', 'TTL #', 'PC', 'SAL-RATE'], dtype='object')
Mass renaming
You can also just pass in a new list of columns if you don’t like what they come in as.
DPT | NAME | ADDRESS | TTL # | PC | SAL-RATE | |
---|---|---|---|---|---|---|
0 | 868 | B J SANDIFORD | DEPARTMENT OF CITYWIDE ADM | 12702 | X | $5.00 |
1 | 868 | C A WIGFALL | DEPARTMENT OF CITYWIDE ADM | 12702 | X | $5.00 |
2 | 69 | A E A-AWOSOGBA | HRA/DEPARTMENT OF SOCIAL S | 52311 | A | $51955.00 |
Department | Name | Address | Title | Pay Class | Salary Rate | |
---|---|---|---|---|---|---|
0 | 868 | B J SANDIFORD | DEPARTMENT OF CITYWIDE ADM | 12702 | X | $5.00 |
1 | 868 | C A WIGFALL | DEPARTMENT OF CITYWIDE ADM | 12702 | X | $5.00 |
2 | 69 | A E A-AWOSOGBA | HRA/DEPARTMENT OF SOCIAL S | 52311 | A | $51955.00 |
Rename on import
You also set their names when you’re reading in the csv.
/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/site-packages/IPython/core/interactiveshell.py:2723: DtypeWarning: Columns (0) have mixed types. Specify dtype option on import or set low_memory=False.
interactivity=interactivity, compiler=compiler, result=result)
Department | Name | Address | Title | Pay Class | Salary Rate | |
---|---|---|---|---|---|---|
0 | DPT | NAME | ADDRESS | TTL # | PC | SAL-RATE |
1 | 868 | B J SANDIFORD | DEPARTMENT OF CITYWIDE ADM | 12702 | X | $5.00 |
2 | 868 | C A WIGFALL | DEPARTMENT OF CITYWIDE ADM | 12702 | X | $5.00 |
Unfortunately this leaves you with the first row of actual headers inside of
your data. When usings names=
in read_csv
, add skiprows=1
to skip the
first row (the header row).
Department | Name | Address | Title | Pay Class | Salary Rate | |
---|---|---|---|---|---|---|
0 | 868 | B J SANDIFORD | DEPARTMENT OF CITYWIDE ADM | 12702 | X | $5.00 |
1 | 868 | C A WIGFALL | DEPARTMENT OF CITYWIDE ADM | 12702 | X | $5.00 |
2 | 69 | A E A-AWOSOGBA | HRA/DEPARTMENT OF SOCIAL S | 52311 | A | $51955.00 |