Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
64 views
in Technique[技术] by (71.8m points)

python - Pandas Group By multiple Columns and return sorted list

In a DataFrame df, group using multiple colunms, and for each group, find elements of third column, make a sorted list of those elements and attach it to the original Data Frame.

Example Given

df = pd.DataFrame({'c':[1,1,2,2,3,3],'l1':['a','a','a','a','b','b'],'l3':['b','a','b','a','a','a'],'l4':[1,2,3,4,5,6]})

df

c   l1  l3  l4
0.  1   a   b   1
1.  1   a   a   2
2.  2   a   b   3
3.  2   a   a   4
4.  3   b   a   5
5.  3   b   a   6

Tried,

def makePair(l3):
    #print(type(k))
    k=l3.sort_values()
    k=k.to_list()
    print(k) # Prints correctly BA , BA
    return k
df['pair'] = df.groupby(['c','l1'])['l3'].transform(makePair).copy()
df

Output

  • ['a', 'b']
  • ['a', 'b']
  • ['a', 'a']

.. So far Go Good, but result df is

.  c    l1  l3  l4  pair
0   1   a   b   1   a
1   1   a   a   2   b
2   2   a   b   3   a
3   2   a   a   4   b
4   3   b   a   5   a
5   3   b   a   6   a

Expect

df.pair = [ ['a', 'b'], ['a', 'b'], ['a', 'b'], ['a', 'b'], ['a', 'a'], ['a', 'a'] ]
question from:https://stackoverflow.com/questions/65848854/pandas-group-by-multiple-columns-and-return-sorted-list

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

Using your function you can use:

cols = ['c','l1']
out = (df.set_index(cols).assign(pair=df.groupby(cols)['l3']
         .agg(makePair)).reset_index()
         .reindex(df.columns.union(['pair'],sort=False),axis=1))

Full code:

def makePair(l3):
    k=l3.sort_values()
    k=k.to_list()
    return k
cols = ['c','l1']
out = (df.set_index(cols).assign(pair=df.groupby(cols)['l3']
      .agg(makePair)).reset_index()
       .reindex(df.columns.union(['pair'],sort=False),axis=1))
print(out)

Else you can also do:

cols = ['c','l1']
out = (df.set_index(cols).assign(pair=
      df.sort_values(cols+['l3']).groupby(cols)['l3'].agg(list)).reset_index())
print(out)

   c l1 l3  l4    pair
0  1  a  b   1  [a, b]
1  1  a  a   2  [a, b]
2  2  a  b   3  [a, b]
3  2  a  a   4  [a, b]
4  3  b  a   5  [a, a]
5  3  b  a   6  [a, a]

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...