Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
77 views
in Technique[技术] by (71.8m points)

python - finding probabilities in a column for a dataframe

I have a dataframe like this:

state   class
   A      0
   B      1
   C      1
   A      0
   A      1
   B      1
   A      0
   A      1
   C      1
   C      0

and im trying to find the probability for each of the unique value found in state as per output class such that the resultant output would b like:

State_0   State_1    Class
  3/5       2/5         0
  0/2       2/2         1
  1/3       2/3         1
  3/5       2/5         0
  3/5       2/5         1
  0/2       2/2         1
  3/5       2/5         0
  3/5       2/5         1
  1/3       2/3         1
  1/3       2/3         0

logic used to find these values:
A,B,C are the unique values in state, and A occurs total 5 times out of which 3 times when class is 0 and 2 times when class is 1. I am able to find the state_0 and state_1 value for a single state like A,B or C but not able to apply it on whole dataset.

Could anyone please help here or suggest..

question from:https://stackoverflow.com/questions/65881973/finding-probabilities-in-a-column-for-a-dataframe

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

Use crosstab with normalize=0, then add prefix to columns by DataFrame.add_prefix and add to original DataFrame by DataFrame.join:

df1 = df.join(pd.crosstab(df['state'], df['class'], normalize=0).add_prefix('State_'), 
              on='state')
print (df1)
  state  class   State_0   State_1
0     A      0  0.600000  0.400000
1     B      1  0.000000  1.000000
2     C      1  0.333333  0.666667
3     A      0  0.600000  0.400000
4     A      1  0.600000  0.400000
5     B      1  0.000000  1.000000
6     A      0  0.600000  0.400000
7     A      1  0.600000  0.400000
8     C      1  0.333333  0.666667
9     C      0  0.333333  0.666667

Last if need filter some columns:

df2 = df1.reindex(['State_0','State_1','class'], axis=1)
print (df2)
    State_0   State_1  class
0  0.600000  0.400000      0
1  0.000000  1.000000      1
2  0.333333  0.666667      1
3  0.600000  0.400000      0
4  0.600000  0.400000      1
5  0.000000  1.000000      1
6  0.600000  0.400000      0
7  0.600000  0.400000      1
8  0.333333  0.666667      1
9  0.333333  0.666667      0

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...