calculating the outliers in R

Question

Welcome To Ask or Share your Answers For Others

calculating the outliers in R

posted Oct 17, 2021 in Technique[技术] by 深蓝 (71.8m points)

calculating the outliers in R

I have a data frame like this:

x

Team 01/01/2012  01/02/2012  01/03/2012  01/01/2012 01/04/2012 SD Mean
A     100         50           40        NA         30       60  80

I like to perform calculation on each cell to the mean and sd to calculate the outliers. For example,

abs(x-Mean) > 3*SD

x$count<-c(1) (increment this value if the above condition is met).

I am doing this to check the anomaly in my data set. If I know the column names, it would be easier to do the calculations, but number of columns will vary. Some cells may have NA in them.

I like to subtrack mean from each cell, and I tried this

x$diff<-sweep(x, 1, x$Mean, FUN='-')

does not seem to be working, any ideas?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Reply

深蓝 · Answer 1 · 2021-10-17T02:47:47+0000

Get your IQR (Interquartile range) and lower/upper quartile using:

lowerq = quantile(data)[2]
upperq = quantile(data)[4]
iqr = upperq - lowerq #Or use IQR(data)

Compute the bounds for a mild outlier:

mild.threshold.upper = (iqr * 1.5) + upperq
mild.threshold.lower = lowerq - (iqr * 1.5)

Any data point outside (> mild.threshold.upper or < mild.threshold.lower) these values is a mild outlier

To detect extreme outliers do the same, but multiply by 3 instead:

extreme.threshold.upper = (iqr * 3) + upperq
extreme.threshold.lower = lowerq - (iqr * 3)

Any data point outside (> extreme.threshold.upper or < extreme.threshold.lower) these values is an extreme outlier

Hope this helps

edit: was accessing 50%, not 75%

Categories

calculating the outliers in R

calculating the outliers in R

Please log in or register to add a comment.

Please log in or register to reply this article.

1 Reply

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags