Here is a solution using data.table
(while not specifically requested, it is an obvious compliment or replacement for aggregate
or ddply
. As well as being slightly long to code, repeatedly calling quantile
will be inefficient, as for each call you will be sorting the data
library(data.table)
Tukeys_five <- c("Min","Q1","Med","Q3","Max")
IRIS <- data.table(iris)
# this will create the wide data.table
lengthBySpecies <- IRIS[,as.list(fivenum(Sepal.Length)), by = Species]
# and you can rename the columns from V1, ..., V5 to something nicer
setnames(lengthBySpecies, paste0('V',1:5), Tukeys_five)
lengthBySpecies
Species Min Q1 Med Q3 Max
1: setosa 4.3 4.8 5.0 5.2 5.8
2: versicolor 4.9 5.6 5.9 6.3 7.0
3: virginica 4.9 6.2 6.5 6.9 7.9
Or, using a single call to quantile
using the appropriate prob
argument.
IRIS[,as.list(quantile(Sepal.Length, prob = seq(0,1, by = 0.25))), by = Species]
Species 0% 25% 50% 75% 100%
1: setosa 4.3 4.800 5.0 5.2 5.8
2: versicolor 4.9 5.600 5.9 6.3 7.0
3: virginica 4.9 6.225 6.5 6.9 7.9
Note that the names of the created columns are not syntactically valid, although you could go through a similar renaming using setnames
EDIT
Interestingly, quantile
will set the names of the resulting vector if you set names = TRUE
, and this will copy (slow down the number crunching and consume memory - it even warns you in the help, fancy that!)
Thus, you should probably use
IRIS[,as.list(quantile(Sepal.Length, prob = seq(0,1, by = 0.25), names = FALSE)), by = Species]
Or, if you wanted to return the named list, without R
copying internally
IRIS[,{quant <- as.list(quantile(Sepal.Length, prob = seq(0,1, by = 0.25), names = FALSE))
setattr(quant, 'names', Tukeys_five)
quant}, by = Species]