Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
584 views
in Technique[技术] by (71.8m points)

python - How to annotate a stacked bar chart with word count and column name

My question is about plotting in a stacked bar plot the words frequency rather than numbers with labels on the bar. Let's suppose that I have these words

Date    Text     Count
01/01/2020  cura    25
           destra   24
             fino   18
            guerra  13
        americani   13
02/01/2020  italia  137
            turismo 112
            nuovi   109
             pizza  84
            moda    79

created by grouping by date and aggregating by Text, then selecting the top 5 (head(5)):

Attempt:

(my attempt: this generates a stacked plot, but colours and labels are not what I would like to expect)

data.groupby('Date').agg({'Text': 'value_counts'}).rename(columns={'Text': 'Count'}).groupby('Date').head(5).unstack().plot(kind='bar', stacked=True)

Request: My expected output would be a bar chart where on the x-axis there are the dates and on the y-axis the words frequency (each word on the same date should be coloured in a different way like in a stacked plot and each bar should show words and their frequency).

Example: Please see below an example of stacked plot that it will be useful to explain what I would like to do (if it is possible). In the bars, instead of the numbers (340, 226,...), I would like to have the name of the top words selected by that code above and their frequency. On the x-axis there will be the date that I have shown you previously, not the year (I could not find a better plot on the web). The first bar shows the top 4 words (they should be 5 but I found only a bar chart with 4 groups) and how I would like to visualise the results. For the size of the chart, could you please keep in mind that I have 200 dates? It would be useful for visualising it.

If you would like to show me how to do it, even using another dataset, it would be great. Thank you so much in advance for the time you will spend helping me. enter image description here

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

Create dataframe

import pandas as pd
import matplotlib.pyplot as plt

# data and dataframe
data = {'Date': ['01/01/2020', '01/01/2020', '01/01/2020', '02/01/2020', '02/01/2020', '02/01/2020'],
        'Text': [['cura']*25, ['destra']*24, ['fino']*18, ['italia']*137, ['turismo']*112, ['nuovi']*109]}

df = pd.DataFrame(data)

df = df.explode('Text')

df.Date = pd.to_datetime(df.Date)

groupby and plot

  • In order to plot the words, note that each date row has all the words as columns.
  • Even though some words are 0 count, the plotting api still includes that information
  • The api plots the first column for all dates, then the next column for all dates, and so on.
  • As such, the cols list, used for the text annotations, must have each word repeated for as many dates exist in df_gb.
  • If you need to use head(), swap the following line for df_gb:
    • df_gb = df.groupby('Date').agg({'Text': 'value_counts'}).rename(columns={'Text': 'Count'}).groupby('Date').head(2).unstack()
df_gb = df.groupby(['Date']).agg({'Text': 'value_counts'}).rename(columns={'Text': 'Count'}).unstack('Text')

print(df_gb)

           Count                                   
Text        cura destra  fino italia  nuovi turismo
Date                                               
2020-01-01  25.0   24.0  18.0    NaN    NaN     NaN
2020-02-01   NaN    NaN   NaN  137.0  109.0   112.0

# create list of words of appropriate length; all words repeat for each date
cols = [x[1] for x in df_gb.columns for _ in range(len(df_gb))]

# plot df_gb
ax = df_gb.plot.bar(stacked=True)

# annotate the bars
for i, rect in enumerate(ax.patches):
    # Find where everything is located
    height = rect.get_height()
    width = rect.get_width()
    x = rect.get_x()
    y = rect.get_y()

    # The height of the bar is the count value and can used as the label
    label_text = f'{height:.0f}: {cols[i]}'

    label_x = x + width / 2
    label_y = y + height / 2

    # don't include label if it's equivalently 0
    if height > 0.001:
        ax.text(label_x, label_y, label_text, ha='center', va='center', fontsize=8)

# rename xtick labels; remove time
ticks, labels = plt.xticks(rotation=90)
labels = [label.get_text()[:10] for label in labels]
plt.xticks(ticks=ticks, labels=labels)

ax.get_legend().remove()
plt.show()

enter image description here


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...