Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
964 views
in Technique[技术] by (71.8m points)

python - Is there a way to get correlation with string data and a numerical value in pandas?

I'm trying to get a correlation in pandas that's giving me a bit of difficulty. Essentially I want to answer the following question: given a sentence and a value and a dataframe, what word correlates the best with a higher value? What about the worst?

Trivial example:

Sentence      | Score
"hello there" | 100
"hello kid"   | 95
"there kid"   | 5

I'm expecting to see a high correlation value here for the word "hello" and score. Hopefully this makes sense -- if this is possible natively in Pandas I'd really appreciate knowing!

If anything is unclear please let me know.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

I'm not sure that pandas is what you looking for, but yes, you can:

import pandas as pd

df = pd.DataFrame([ ["hello there", 100],
                    ["hello kid",   95],
                    ["there kid",   5]
                  ], columns = ['Sentence','Score'])

s_corr = df.Sentence.str.get_dummies(sep=' ').corrwith(df.Score/df.Score.max())
print (s_corr)

Will return you

hello    0.998906
kid     -0.539949
there   -0.458957

for details see pandas help

  1. str.get_dummies()
  2. corrwith()

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...