In a DataFrame df, group using multiple colunms, and for each group, find elements of third column, make a sorted list of those elements and attach it to the original Data Frame.
Example Given
df = pd.DataFrame({'c':[1,1,2,2,3,3],'l1':['a','a','a','a','b','b'],'l3':['b','a','b','a','a','a'],'l4':[1,2,3,4,5,6]})
df
c l1 l3 l4
0. 1 a b 1
1. 1 a a 2
2. 2 a b 3
3. 2 a a 4
4. 3 b a 5
5. 3 b a 6
Tried,
def makePair(l3):
#print(type(k))
k=l3.sort_values()
k=k.to_list()
print(k) # Prints correctly BA , BA
return k
df['pair'] = df.groupby(['c','l1'])['l3'].transform(makePair).copy()
df
Output
- ['a', 'b']
- ['a', 'b']
- ['a', 'a']
.. So far Go Good, but result df
is
. c l1 l3 l4 pair
0 1 a b 1 a
1 1 a a 2 b
2 2 a b 3 a
3 2 a a 4 b
4 3 b a 5 a
5 3 b a 6 a
Expect
df.pair = [ ['a', 'b'], ['a', 'b'], ['a', 'b'], ['a', 'b'], ['a', 'a'], ['a', 'a'] ]
question from:
https://stackoverflow.com/questions/65848854/pandas-group-by-multiple-columns-and-return-sorted-list