Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
585 views
in Technique[技术] by (71.8m points)

python - How to get unique values from multiple columns in a pandas groupby

Starting from this dataframe df:

df = pd.DataFrame({'c':[1,1,1,2,2,2],'l1':['a','a','b','c','c','b'],'l2':['b','d','d','f','e','f']})

   c l1 l2
0  1  a  b
1  1  a  d
2  1  b  d
3  2  c  f
4  2  c  e
5  2  b  f

I would like to perform a groupby over the c column to get unique values of the l1 and l2 columns. For one columns I can do:

g = df.groupby('c')['l1'].unique()

that correctly returns:

c
1    [a, b]
2    [c, b]
Name: l1, dtype: object

but using:

g = df.groupby('c')['l1','l2'].unique()

returns:

AttributeError: 'DataFrameGroupBy' object has no attribute 'unique'

I know I can get the unique values for the two columns with (among others):

In [12]: np.unique(df[['l1','l2']])
Out[12]: array(['a', 'b', 'c', 'd', 'e', 'f'], dtype=object)

Is there a way to apply this method to the groupby in order to get something like:

c
1    [a, b, d]
2    [c, b, e, f]
Name: l1, dtype: object
See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

You can do it with apply:

import numpy as np
g = df.groupby('c')['l1','l2'].apply(lambda x: list(np.unique(x)))

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...