r - ddply + summarize for repeating same statistical function across large number of columns

Question

Welcome To Ask or Share your Answers For Others

r - ddply + summarize for repeating same statistical function across large number of columns

posted Oct 24, 2021 in Technique[技术] by 深蓝 (71.8m points)

r - ddply + summarize for repeating same statistical function across large number of columns

Ok, second R question in quick succession.

My data:

           Timestamp    St_01  St_02 ...
1 2008-02-08 00:00:00  26.020 25.840 ...
2 2008-02-08 00:10:00  25.985 25.790 ...
3 2008-02-08 00:20:00  25.930 25.765 ...
4 2008-02-08 00:30:00  25.925 25.730 ...
5 2008-02-08 00:40:00  25.975 25.695 ...
...

Basically normally I would use a combination of ddply and summarize to calculate ensembles (e.g. mean for every hour across the whole year).

In the case above, I would create a category, e.g. hour (e.g. strptime(data$Timestamp,"%H") -> data$hour and then use that category in ddply, like ddply(data,"hour", summarize, St_01=mean(St_01), St_02=mean(St_02)...) to average by category across each of the columns.

but here is where it gets sticky. I have more than 40 columns to deal with and I'm not prepared to type them all one by one as parameters to the summarize function. I used to write a loop in shell to generate this code but that's not how programmers solve problems is it?

So pray tell, does anyone have a better way of achieving the same result but with less keystrokes?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Reply

深蓝 · Answer 1 · 2021-10-23T17:43:05+0000

You can use numcolwise() to run a summary over all numeric columns.

Here is an example using iris:

ddply(iris, .(Species), numcolwise(mean))
     Species Sepal.Length Sepal.Width Petal.Length Petal.Width
1     setosa        5.006       3.428        1.462       0.246
2 versicolor        5.936       2.770        4.260       1.326
3  virginica        6.588       2.974        5.552       2.026

Similarly, there is catcolwise() to summarise over all categorical columns.

See ?numcolwise for more help and examples.

EDIT

An alternative approach is to use reshape2 (proposed by @gsk3). This has more keystrokes in this example, but gives you enormous flexibility:

library(reshape2)

miris <- melt(iris, id.vars="Species")
x <- ddply(miris, .(Species, variable), summarize, mean=mean(value))

dcast(x, Species~variable, value.var="mean")
     Species Sepal.Length Sepal.Width Petal.Length Petal.Width
1     setosa        5.006       3.428        1.462       0.246
2 versicolor        5.936       2.770        4.260       1.326
3  virginica        6.588       2.974        5.552       2.026

Categories

r - ddply + summarize for repeating same statistical function across large number of columns

r - ddply + summarize for repeating same statistical function across large number of columns

Please log in or register to add a comment.

Please log in or register to reply this article.

1 Reply

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags