Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
956 views
in Technique[技术] by (71.8m points)

python 3.x - Hash each row of pandas dataframe column using apply

I'm trying to hash each value of a python 3.6 pandas dataframe column with the following algorithm on the dataframe-column ORIG:

HK_ORIG = base64.b64encode(hashlib.sha1(str(df.ORIG).encode("UTF-8")).digest())

However, the above mentioned code does not hash each value of the column, so, in order to hash each value of the df-column ORIG, I need to use the apply function. Unfortunatelly, I don't seem to be good enough to get this done.

I imagine it to look like the following code:

df["HK_ORIG"] = str(df['ORIG']).encode("UTF-8")).apply(hashlib.sha1)

I'm looking very much forward to your answers! Many thanks in advance!

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

You can either create a named function and apply it - or apply a lambda function. In either case, do as much processing as possible withing the dataframe.

A lambda-based solution:

df['ORIG'].astype(str).str.encode('UTF-8')
          .apply(lambda x: base64.b64encode(hashlib.sha1(x).digest()))

A named function solution:

def hashme(x):
    return base64.b64encode(hashlib.sha1(x).digest())
df['ORIG'].astype(str).str.encode('UTF-8')
          .apply(hashme)

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...