Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
499 views
in Technique[技术] by (71.8m points)

python - Dynamically accessing a pandas dataframe column

Consider this simple example

import pandas as pd

df = pd.DataFrame({'one' : [1,2,3],
                   'two' : [1,0,0]})

df 
Out[9]: 
   one  two
0    1    1
1    2    0
2    3    0

I want to write a function that takes as inputs a dataframe df and a column mycol.

Now this works:

df.groupby('one').two.sum()
Out[10]: 
one
1    1
2    0
3    0
Name: two, dtype: int64

this works too:

 def okidoki(df,mycol):
    return df.groupby('one')[mycol].sum()

okidoki(df, 'two')
Out[11]: 
one
1    1
2    0
3    0
Name: two, dtype: int64

but this FAILS

def megabug(df,mycol):
    return df.groupby('one').mycol.sum()

megabug(df, 'two')
 AttributeError: 'DataFrameGroupBy' object has no attribute 'mycol'

What is wrong here?

I am worried that okidoki uses some chaining that might create some subtle bugs (https://pandas.pydata.org/pandas-docs/stable/indexing.html#why-does-assignment-fail-when-using-chained-indexing).

How can I still keep the syntax groupby('one').mycol? Can the mycol string be converted to something that might work that way? Thanks!

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

You pass a string as the second argument. In effect, you're trying to do something like:

df.'two'

Which is invalid syntax. If you're trying to dynamically access a column, you'll need to use the index notation, [...] because the dot/attribute accessor notation doesn't work for dynamic access.


Dynamic access on its own is possible. For example, you can use getattr (but I don't recommend this, it's an antipattern):

In [674]: df
Out[674]: 
   one  two
0    1    1
1    2    0
2    3    0

In [675]: getattr(df, 'one')
Out[675]: 
0    1
1    2
2    3
Name: one, dtype: int64

Dynamically selecting by attribute from a groupby call can be done, something like:

In [677]: getattr(df.groupby('one'), mycol).sum() 
Out[677]: 
one
1    1
2    0
3    0
Name: two, dtype: int64

But don't do it. It is a horrid anti pattern, and much more unreadable than df.groupby('one')[mycol].sum().


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...