transform
is not that well documented, but it seems that the way it works is that what the transform function is passed is not the entire group as a dataframe, but a single column of a single group. I don't think it's really meant for what you're trying to do, and your solution with apply
is fine.
So suppose tips.groupby('smoker').transform(func)
. There will be two groups, call them group1 and group2. The transform does not call func(group1)
and func(group2)
. Instead, it calls func(group1['total_bill'])
, then func(group1['tip'])
, etc., and then func(group2['total_bill'])
, func(group2['tip'])
. Here's an example:
>>> print d
A B C
0 -2 5 4
1 1 -1 2
2 0 2 1
3 -3 1 2
4 5 0 2
>>> def foo(df):
... print ">>>"
... print df
... print "<<<"
... return df
>>> print d.groupby('C').transform(foo)
>>>
2 0
Name: A
<<<
>>>
2 2
Name: B
<<<
>>>
1 1
3 -3
4 5
Name: A
<<<
>>>
1 -1
3 1
4 0
Name: B
# etc.
You can see that foo
is first called with just the A column of the C=1 group of the original data frame, then the B column of that group, then the A column of the C=2 group, etc.
This makes sense if you think about what transform is for. It's meant for applying transform functions on the groups. But in general, these functions won't make sense when applied to the entire group, only to a given column. For instance, the example in the pandas docs is about z-standardizing using transform
. If you have a DataFrame with columns for age and weight, it wouldn't make sense to z-standardize with respect to the overall mean of both these variables. It doesn't even mean anything to take the overall mean of a bunch of numbers, some of which are ages and some of which are weights. You have to z-standardize the age with respect to the mean age and the weight with respect to the mean weight, which means you want to transform separately for each column.
So basically, you don't need to use transform here. apply
is the appropriate function here, because apply
really does operate on each group as a single DataFrame, while transform
operates on each column of each group.
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…