python - how to use pandas filter with IQR

Question

Welcome To Ask or Share your Answers For Others

python - how to use pandas filter with IQR

1 Reply

深蓝 · Answer 1 · 2021-10-23T18:29:35+0000

As far as I know, the most compact notation seems to be brought by the query method.

# Some test data
np.random.seed(33454)
df = (
    # A standard distribution
    pd.DataFrame({'nb': np.random.randint(0, 100, 20)})
        # Adding some outliers
        .append(pd.DataFrame({'nb': np.random.randint(100, 200, 2)}))
        # Reseting the index
        .reset_index(drop=True)
    )

# Computing IQR
Q1 = df['nb'].quantile(0.25)
Q3 = df['nb'].quantile(0.75)
IQR = Q3 - Q1

# Filtering Values between Q1-1.5IQR and Q3+1.5IQR
filtered = df.query('(@Q1 - 1.5 * @IQR) <= nb <= (@Q3 + 1.5 * @IQR)')

Then we can plot the result to check the difference. We observe that the outlier in the left boxplot (the cross at 183) does not appear anymore in the filtered series.

# Ploting the result to check the difference
df.join(filtered, rsuffix='_filtered').boxplot()

Since this answer I've written a post on this topic were you may find more information.

Categories

python - how to use pandas filter with IQR

python - how to use pandas filter with IQR

Please log in or register to add a comment.

Please log in or register to reply this article.

1 Reply

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags