Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
495 views
in Technique[技术] by (71.8m points)

r - Apply a summarise condition to a range of columns when using dplyr group_by?

Suppose we want to group_by() and summarise a massive data.frame with very many columns, but that there are some large groups of consecutive columns that will have the same summarise condition (e.g. max, mean etc)

Is there a way to avoid having to specify the summarise condition for each and every column, and instead do it for ranges of columns?

Example

Suppose we want to do this:

iris %>% 
  group_by(Species) %>% 
  summarise(max(Sepal.Length), mean(Sepal.Width), mean(Petal.Length), mean(Petal.Width))

but note that 3 consecutive columns have the same summarise condition, mean(Sepal.Width), mean(Petal.Length), mean(Petal.Width)

Is there a way to use some method like mean(Sepal.Width:Petal.Width) to specify the condition for the range of columns, and hence a avoiding having to type out the summarise condition multiple times for all the columns in between)

Note

The iris example above is a small and manageable example that has a range of 3 consecutive columns, but actual use case has ~hundreds.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

The upcoming version 1.0.0 of dplyr will have across() function that does what you wish for

Basic usage

across() has two primary arguments:

  • The first argument, .cols, selects the columns you want to operate on. It uses tidy selection (like select()) so you can pick variables by position, name, and type.
  • The second argument, .fns, is a function or list of functions to apply to each column. This can also be a purrr style formula (or list of formulas) like ~ .x / 2. (This argument is optional, and you can omit it if you just want to get the underlying data; you'll see that technique used in vignette("rowwise").)
### Install development version on GitHub first
# install.packages("devtools")
# devtools::install_github("tidyverse/dplyr")
library(dplyr, warn.conflicts = FALSE)

Control how the names are created with the .names argument which takes a glue spec:

iris %>% 
  group_by(Species) %>% 
  summarise(
    across(c(Sepal.Width:Petal.Width), ~ mean(.x, na.rm = TRUE), .names = "mean_{col}"),
    across(c(Sepal.Length), ~ max(.x, na.rm = TRUE), .names = "max_{col}")
    )
#> # A tibble: 3 x 5
#>   Species    mean_Sepal.Width mean_Petal.Leng~ mean_Petal.Width max_Sepal.Length
#> * <fct>                 <dbl>            <dbl>            <dbl>            <dbl>
#> 1 setosa                 3.43             1.46            0.246              5.8
#> 2 versicolor             2.77             4.26            1.33               7  
#> 3 virginica              2.97             5.55            2.03               7.9

Using multiple functions

my_func <- list(
  mean = ~ mean(., na.rm = TRUE),
  max  = ~ max(., na.rm = TRUE)
)

iris %>%
  group_by(Species) %>%
  summarise(across(where(is.numeric), my_func, .names = "{fn}.{col}"))
#> # A tibble: 3 x 9
#>   Species    mean.Sepal.Length max.Sepal.Length mean.Sepal.Width max.Sepal.Width
#> * <fct>                  <dbl>            <dbl>            <dbl>           <dbl>
#> 1 setosa                  5.01              5.8             3.43             4.4
#> 2 versicolor              5.94              7               2.77             3.4
#> 3 virginica               6.59              7.9             2.97             3.8
#>   mean.Petal.Length max.Petal.Length mean.Petal.Width max.Petal.Width
#> *             <dbl>            <dbl>            <dbl>           <dbl>
#> 1              1.46              1.9            0.246             0.6
#> 2              4.26              5.1            1.33              1.8
#> 3              5.55              6.9            2.03              2.5

Created on 2020-03-06 by the reprex package (v0.3.0)


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...