import pandas as pd
import seaborn as sns

Wide data

Wide data is a dataframe where your columns measure the same things across different categories. In the example below, we have the number of accidents (our measurement) across different months (our categories).

wide_df = pd.DataFrame({
    'category': ['car', 'bus', 'plane', 'horse', 'submarine', 'train', 'subway', 'spaceship'],
    'JAN': [1, 4, 3, 2, 6, 2, 4, 5],
    'FEB': [3, 5, 2, 2, 6, 3, 2, 4],
    'MAR': [5, 3, 3, 5, 2, 5, 6, 4]
})
wide_df

	category	JAN	FEB	MAR
0	car	1	3	5
1	bus	4	5	3
2	plane	3	2	3
3	horse	2	2	5
4	submarine	6	6	2
5	train	2	3	5
6	subway	4	2	6
7	spaceship	5	4	4

# I can only plot one month at a time!
sns.catplot(data=wide_df, y='JAN', x='category')

<seaborn.axisgrid.FacetGrid at 0x117fb5240>

png

Wide data is great for stacked bar graphs, but for everything else it can be a real pain (especially with seaborn). Most of the software you’ll use to graph loves long data instead.

Long data

Long data is similar to the idea of tidy data which is very very popular in current-day R programming. Basically speaking, each row is a measurement.

To convert from wide data to long data, you use .melt. I think you can figure out what the columns mean based on what’s down below!

long_df = wide_df.melt(id_vars=['category'],
                       value_vars=['JAN', 'FEB', 'MAR'],
                       var_name='month',
                       value_name='accidents')
long_df

	category	month	accidents
0	car	JAN	1
1	bus	JAN	4
2	plane	JAN	3
3	horse	JAN	2
4	submarine	JAN	6
5	train	JAN	2
6	subway	JAN	4
7	spaceship	JAN	5
8	car	FEB	3
9	bus	FEB	5
10	plane	FEB	2
11	horse	FEB	2
12	submarine	FEB	6
13	train	FEB	3
14	subway	FEB	2
15	spaceship	FEB	4
16	car	MAR	5
17	bus	MAR	3
18	plane	MAR	3
19	horse	MAR	5
20	submarine	MAR	2
21	train	MAR	5
22	subway	MAR	6
23	spaceship	MAR	4

# Now I can plot whatever I want!
sns.catplot(data=long_df, y='month', x='accidents', hue='category')

<seaborn.axisgrid.FacetGrid at 0x1180bd1d0>

png

Transposing my data

What if I wanted every one of my categories to be a column?

wide_df

	category	JAN	FEB	MAR
0	car	1	3	5
1	bus	4	5	3
2	plane	3	2	3
3	horse	2	2	5
4	submarine	6	6	2
5	train	2	3	5
6	subway	4	2	6
7	spaceship	5	4	4

You can transpose with .T and it’s close, but you don’t end up with any column names.

wide_df.T

	0	1	2	3	4	5	6	7
category	car	bus	plane	horse	submarine	train	subway	spaceship
JAN	1	4	3	2	6	2	4	5
FEB	3	5	2	2	6	3	2	4
MAR	5	3	3	5	2	5	6	4

But notice how the column names are the index from the original dataframe? Turns out you just need to tell it what the index should be, then transpose.

wide_df.set_index('category').T

category	car	bus	plane	horse	submarine	train	subway	spaceship
JAN	1	4	3	2	6	2	4	5
FEB	3	5	2	2	6	3	2	4
MAR	5	3	3	5	2	5	6	4

And then you’re all set to do a nice stacked bar (actually a horrible stacked bar).

wide_df.set_index('category').T.plot(kind='barh', stacked=True)

<matplotlib.axes._subplots.AxesSubplot at 0x1193bb4a8>

png