You could create a Seaborn countplot()
as follows. Using gender
for the x
places it on the x-axis. Using Satisfied?
as the hue
will divide the bars for the genders into smaller bars and create an accompanying legend. If you want to fix a certain order on these values, either hue_order
could be used, or the column could be made categorical.
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import seaborn as sns
N = 500
data = pd.DataFrame({'City': np.random.choice(['Test City', 'Other City'], N),
'Gender': np.random.choice(['Male', 'Female'], N),
'Satisfied?': np.random.choice(['1 - very bad', '2 - bad', '3 - neutral', '4 - good', '5 - very good'], N)})
sns.countplot(data=data[data['City'] == 'Test City'], x='Gender', palette='plasma',
hue='Satisfied?', hue_order=['1 - very bad', '2 - bad', '3 - neutral', '4 - good', '5 - very good'])
plt.show()
From here, further refinements can be made:
- Changing the bar heights such that the sum per gender will be one. This will convert the heights to percents.
- Change the formatting of the y-axis to show percents
- While changing the heights, also the widths of the bars could be changed, leaving a little gap between them
- Putting the legend at the bottom, without frame and with square markers.
- Add the percentage as text above the bars
- Add horizontal grid lines
- Hide the spines
- ...
Seaborn has a myriad of ways to choose colors. The simplest way is to give a list of named colors. But not that existing palettes have been studied to have colors that go well together. The Colorbrewer website can be used to experiment and find colors for many situations.
The variable width_scale
in the code can be used to set the gaps. In the old version 0.8
was set, leaving a gap of 0.2
. The new example has a gap of 1.0 - 0.6 = 0.4
.
Here is an example:
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import seaborn as sns
from matplotlib.ticker import PercentFormatter
N = 500
data = pd.DataFrame({'City': np.random.choice(['Test City', 'Other City'], N),
'Gender': np.random.choice(['Male', 'Female'], N, p=[0.3, 0.7]),
'Satisfied?': np.random.choice(['1 - very bad', '2 - bad', '3 - neutral', '4 - good', '5 - very good'], N)})
city_data = data[data['City'] == 'Test City']
fig, ax = plt.subplots(figsize=(14, 4))
sns.countplot(data=city_data, x='Gender', order=['Male', 'Female'], ax=ax,
palette=['turquoise', 'tomato', 'deepskyblue', 'gold', 'limegreen'],
hue='Satisfied?', hue_order=['1 - very bad', '2 - bad', '3 - neutral', '4 - good', '5 - very good'])
width_scale = 0.6 # the relative width of the bars, 1.0 means bars touching; the gap will be 1-width_scale
for bars in ax.containers:
for bar, total_per_gender in zip(bars, [sum(city_data['Gender'] == 'Male'), sum(city_data['Gender'] == 'Female')]):
new_height = bar.get_height() / total_per_gender
bar.set_height(new_height)
width = bar.get_width()
x = bar.get_x()
bar.set_width(width * width_scale)
bar.set_x(x + width * (1 - width_scale) / 2) # recenter
if np.isnan(new_height):
new_height = 0
ax.text(x + width / 2, new_height, f' {new_height * 100:.1f}%
', ha='center', va='bottom', rotation=90)
ax.set_xlabel('') # remove superfluous x-label
ax.set_ylabel('')
ax.tick_params(axis='x', length=0, labelsize=14) # remove tick marks, larger text
ax.yaxis.set_major_formatter(PercentFormatter(1))
ax.grid(axis='y', ls=':', clip_on=False)
sns.despine(fig, ax, top=True, right=True, left=True, bottom=True)
ax.legend(ncol=5, bbox_to_anchor=(0.5, -0.1), loc='upper center', frameon=False, handlelength=1, handleheight=1)
ax.autoscale() # needed to recalculate the axis limits after changing the heights
ax.relim()
ax.margins(y=0.15, x=0.02) # some space for the text on top of the bars
plt.tight_layout()
plt.show()