Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
272 views
in Technique[技术] by (71.8m points)

python - Getting all unique elements from a pandas DataFrame column of numpy arrays of strings

I have a pandas Data Frame df of which elements of one column col is a numpy.ndarray of str type. For example,

col
['I like tea', 'cricket ']
['basket ball', 'I like coffee', 'cricket ']
['I like tea', 'cricket ']
['basket ball', 'cricket ']

now I want to get number of such unique numpy.ndarray in the col to convert it into a categorical column with new column containing positive integer values for each unique numpy.ndarray. When I'm using df['col'].unique it is throwing following error

TypeError: unhashable type: 'numpy.ndarray'

How to find the number of unique elements for this numpy.ndarray column?

  • edit: The output I'm expecting is,

    ['I like tea', 'cricket '],['basket ball', 'I like coffee', 'cricket '],['basket ball', 'cricket '] These are the unique lists in the column col. I want these to be outputed.

  • edit 2: When I converted each list of the col into a tuple, I'm getting the required result. Why is this happening?


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

You should convert the np arrays to hashable type..

try this:

df['col'].apply(tuple).unique()

Or if you want unique individuals that inside the lists and not lists itself:

df['col'].apply(tuple).explode().unique()

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...