Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
108 views
in Technique[技术] by (71.8m points)

python - filter a dataframe based on a specific value for each category in pandas

I have a dataframe

df = url   browser     loadtime
      A     safari      1500
      A     safari      1650
      A     Chrome      2800
      B     IE          3150
      B     safari      3300
      C     Chrome      2650
      .      .            .
      .      .            .            

I need to compute the upper outlier of the load time per app using the 3 QI rule of thumb and then filter df keeping only rows where for each app, loadtime is less than the upper outlier for this same app.

This is how I proceed.

  1. I compute the upper outlier using the 3QI rule of thumb
def upper_outlier(x):
    return np.percentile(x, 75) + 3*(np.percentile(x,75)-np.percentile(x,25))

## Find the upper outlier threshold per app
df_grouped = df.groupby("app")['loadtime'].agg([('upper_outlier', lambda x : upper_outlier(x))])

This way for each app I have the corresponding upper outlier

  1. I filter df using df_grouped
df_new = pd.DataFrame()
for app in df.app.unique():
    df_new = pd.concat([df_new,df.loc[(df.app==app)&(df.loadtime<df_grouped.loc[app, 'upper_outlier'])]], axis = 0).reset_index(drop=True)

The for loop takes long as I have a lot of data. Is there a cleaner pythonic way of doing this?

question from:https://stackoverflow.com/questions/66049765/filter-a-dataframe-based-on-a-specific-value-for-each-category-in-pandas

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

You can try to merge your calculation with the original dataframe

df_grouped = df.groupby("app")['loadtime'].agg([('upper_outlier', lambda x : upper_outlier(x))]).reset_index()

dfmerged = df.merge(df_grouped, on = 'app', how = 'left')

and then filter

dfmerged[dfmerged.loadtime<dfmerged.upper_outlier]

Not sure if this is more efficient, but seems more straight forward.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...