Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
528 views
in Technique[技术] by (71.8m points)

python - Plotting categorical variable against numeric variable in matplotlib

My DataFrame's structure

trx.columns
Index(['dest', 'orig', 'timestamp', 'transcode', 'amount'], dtype='object')

I'm trying to plot transcode (transaction code) against amount to see the how much money is spent per transaction. I made sure to convert transcode to a categorical type as seen below.

trx['transcode']
...
Name: transcode, Length: 21893, dtype: category
Categories (3, int64): [1, 17, 99]

The result I get from doing plt.scatter(trx['transcode'], trx['amount']) is

Scatter plot

While the above plot is not entirely wrong, I would like the X axis to contain just the three possible values of transcode [1, 17, 99] instead of the entire [1, 100] range.

Thanks!

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

In matplotlib 2.1 you can plot categorical variables by using strings. I.e. if you provide the column for the x values as string, it will recognize them as categories.

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

df = pd.DataFrame({"x" : np.random.choice([1,17,99], size=100),
                   "y" : np.random.rand(100)*100})

plt.scatter(df["x"].astype(str), df["y"])
plt.margins(x=0.5)
plt.show()

enter image description here

In order to optain the same in matplotlib <=2.0 one would plot against some index instead.

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

df = pd.DataFrame({"x" : np.random.choice([1,17,99], size=100),
                   "y" : np.random.rand(100)*100})

u, inv = np.unique(df["x"], return_inverse=True) 
plt.scatter(inv, df["y"])
plt.xticks(range(len(u)),u)
plt.margins(x=0.5)
plt.show()

The same plot can be obtained using seaborn's stripplot:

sns.stripplot(x="x", y="y", data=df) 

And a potentially nicer representation can be done via seaborn's swarmplot:

sns.swarmplot(x="x", y="y", data=df)

enter image description here


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

1.4m articles

1.4m replys

5 comments

57.0k users

...