Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
380 views
in Technique[技术] by (71.8m points)

python - How to create a percentage bar plot with grouped bars?

i have a dataframe which has columns as 'City, 'Gender','Education level' and 'How satisfied are you about something' Here how my dataframe looks like. enter image description here

So i am trying to plot it into a bar chart;

#in here i select the neighbourhood as "FAT?H"
fatih_ilcesi = data.loc[data['A.01.?stanbul’un hangi il?esinde oturuyorsunuz?'] == 'FAT?H']
#then i group it based on gender and try to plot it with the question of how satisfied are you about something.
fatih_ilcesi.groupby('Cinsiyeti')['A.04. Genel olarak dü?ündü?ünüzde ?l?e Belediyenizin 
hizmetlerinden ne derece memnunsunuz?'].value_counts(normalize = True).plot(kind = "bar").labels()

So this is what i got: enter image description here

But i'd like to get something like this: enter image description here

I could not figure out to make the bars same color as the answers of the question of 'How satisfied are you about something'.

And i want to be able to add percentages at the top of the bar charts. If someone can guide me I would be really greatful. Thank you.

question from:https://stackoverflow.com/questions/65852610/how-to-create-a-percentage-bar-plot-with-grouped-bars

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

You could create a Seaborn countplot() as follows. Using gender for the x places it on the x-axis. Using Satisfied? as the hue will divide the bars for the genders into smaller bars and create an accompanying legend. If you want to fix a certain order on these values, either hue_order could be used, or the column could be made categorical.

import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import seaborn as sns

N = 500
data = pd.DataFrame({'City': np.random.choice(['Test City', 'Other City'], N),
                     'Gender': np.random.choice(['Male', 'Female'], N),
                     'Satisfied?': np.random.choice(['1 - very bad', '2 - bad', '3 - neutral', '4 - good', '5 - very good'], N)})
sns.countplot(data=data[data['City'] == 'Test City'], x='Gender', palette='plasma',
              hue='Satisfied?', hue_order=['1 - very bad', '2 - bad', '3 - neutral', '4 - good', '5 - very good'])
plt.show()

example plot

From here, further refinements can be made:

  • Changing the bar heights such that the sum per gender will be one. This will convert the heights to percents.
  • Change the formatting of the y-axis to show percents
  • While changing the heights, also the widths of the bars could be changed, leaving a little gap between them
  • Putting the legend at the bottom, without frame and with square markers.
  • Add the percentage as text above the bars
  • Add horizontal grid lines
  • Hide the spines
  • ...

Seaborn has a myriad of ways to choose colors. The simplest way is to give a list of named colors. But not that existing palettes have been studied to have colors that go well together. The Colorbrewer website can be used to experiment and find colors for many situations.

The variable width_scale in the code can be used to set the gaps. In the old version 0.8 was set, leaving a gap of 0.2. The new example has a gap of 1.0 - 0.6 = 0.4.

Here is an example:

import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import seaborn as sns
from matplotlib.ticker import PercentFormatter

N = 500
data = pd.DataFrame({'City': np.random.choice(['Test City', 'Other City'], N),
                     'Gender': np.random.choice(['Male', 'Female'], N, p=[0.3, 0.7]),
                     'Satisfied?': np.random.choice(['1 - very bad', '2 - bad', '3 - neutral', '4 - good', '5 - very good'], N)})
city_data = data[data['City'] == 'Test City']
fig, ax = plt.subplots(figsize=(14, 4))
sns.countplot(data=city_data, x='Gender', order=['Male', 'Female'], ax=ax,
              palette=['turquoise', 'tomato', 'deepskyblue', 'gold', 'limegreen'],
              hue='Satisfied?', hue_order=['1 - very bad', '2 - bad', '3 - neutral', '4 - good', '5 - very good'])

width_scale = 0.6  # the relative width of the bars, 1.0 means bars touching; the gap will be 1-width_scale
for bars in ax.containers:
    for bar, total_per_gender in zip(bars, [sum(city_data['Gender'] == 'Male'), sum(city_data['Gender'] == 'Female')]):
        new_height = bar.get_height() / total_per_gender
        bar.set_height(new_height)
        width = bar.get_width()
        x = bar.get_x()
        bar.set_width(width * width_scale)
        bar.set_x(x + width * (1 - width_scale) / 2)  # recenter
        if np.isnan(new_height):
            new_height = 0
        ax.text(x + width / 2, new_height, f' {new_height * 100:.1f}%
', ha='center', va='bottom', rotation=90)
ax.set_xlabel('')  # remove superfluous x-label
ax.set_ylabel('')
ax.tick_params(axis='x', length=0, labelsize=14)  # remove tick marks, larger text
ax.yaxis.set_major_formatter(PercentFormatter(1))
ax.grid(axis='y', ls=':', clip_on=False)
sns.despine(fig, ax, top=True, right=True, left=True, bottom=True)
ax.legend(ncol=5, bbox_to_anchor=(0.5, -0.1), loc='upper center', frameon=False, handlelength=1, handleheight=1)
ax.autoscale()  # needed to recalculate the axis limits after changing the heights
ax.relim()
ax.margins(y=0.15, x=0.02)  # some space for the text on top of the bars
plt.tight_layout()
plt.show()

example percentage plot


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...