python - filter a dataframe based on a specific value for each category in pandas

Question

Welcome To Ask or Share your Answers For Others

python - filter a dataframe based on a specific value for each category in pandas

posted Oct 6, 2021 in Technique[技术] by 深蓝 (71.8m points)

python - filter a dataframe based on a specific value for each category in pandas

I have a dataframe

df = url   browser     loadtime
      A     safari      1500
      A     safari      1650
      A     Chrome      2800
      B     IE          3150
      B     safari      3300
      C     Chrome      2650
      .      .            .
      .      .            .

I need to compute the upper outlier of the load time per app using the 3 QI rule of thumb and then filter df keeping only rows where for each app, loadtime is less than the upper outlier for this same app.

This is how I proceed.

I compute the upper outlier using the 3QI rule of thumb

def upper_outlier(x):
    return np.percentile(x, 75) + 3*(np.percentile(x,75)-np.percentile(x,25))

## Find the upper outlier threshold per app
df_grouped = df.groupby("app")['loadtime'].agg([('upper_outlier', lambda x : upper_outlier(x))])

This way for each app I have the corresponding upper outlier

I filter df using df_grouped

df_new = pd.DataFrame()
for app in df.app.unique():
    df_new = pd.concat([df_new,df.loc[(df.app==app)&(df.loadtime<df_grouped.loc[app, 'upper_outlier'])]], axis = 0).reset_index(drop=True)

The for loop takes long as I have a lot of data. Is there a cleaner pythonic way of doing this?

question from:https://stackoverflow.com/questions/66049765/filter-a-dataframe-based-on-a-specific-value-for-each-category-in-pandas

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Reply

深蓝 · Answer 1 · 2021-10-06T03:16:02+0000

You can try to merge your calculation with the original dataframe

df_grouped = df.groupby("app")['loadtime'].agg([('upper_outlier', lambda x : upper_outlier(x))]).reset_index()

dfmerged = df.merge(df_grouped, on = 'app', how = 'left')

and then filter

dfmerged[dfmerged.loadtime<dfmerged.upper_outlier]

Not sure if this is more efficient, but seems more straight forward.

Categories

python - filter a dataframe based on a specific value for each category in pandas

python - filter a dataframe based on a specific value for each category in pandas

Please log in or register to add a comment.

Please log in or register to reply this article.

1 Reply

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags