Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
213 views
in Technique[技术] by (71.8m points)

r - For-loop to summarize and joining by dplyr

Here is my simplified df:

GP_A <- c(rep("a",3),rep("b",2),rep("c",2))
GP_B <- c(rep("d",2),rep("e",4),rep("f",1))
GENDER <- c(rep("M",4),rep("F",3))
LOC <- c(rep("HK",2),rep("UK",3),rep("JP",2))
SCORE <- c(50,70,80,20,30,80,90)
df <- as.data.frame(cbind(GP_A,GP_B,GENDER,LOC,SCORE))

> df

GP_A GP_B GENDER LOC SCORE
1    a    d      M  HK    50
2    a    d      M  HK    70
3    a    e      M  UK    80
4    b    e      M  UK    20
5    b    e      F  UK    30
6    c    e      F  JP    80
7    c    f      F  JP    90

I want to summarize the score by GP_A, GP_B, or other grouping columns which are not showing in this example. As the count of grouping columns might up to 50, I decided to use for-loop to summarize the score.

The original method is summarizing the score with 1 group one by one:

GP_A_SCORE <- df %>% group_by(GP_A,GENDER,LOC) %>% summarize(SCORE=mean(SCORE))
GP_B_SCORE <- df %>% group_by(GP_B,GENDER,LOC) %>% summarize(SCORE=mean(SCORE))
...

What I want is using the for-loop like this (cannot run):

GP_list <- c("GP_A","GP_B",...)
LOC_list <- c("HK","UK","JP",...)
SCORE <- list()
for (i in GP_list){
    for (j in LOC_list){
SCORE[[paste0(i,j)]] <- df %>% group_by(i,j,GENDER) %>% summarize(SCORE=mean(SCORE))
}}

As in "group_by()", the variables are classified as character and here is the error shown:

Error: Column I, J is unknown

Is there any method to force R to recognize the variable?

I am facing the same problem on the left_join of dplyr.

Error is shown when I was doing something like: left_join(x,y,by=c(i=i)) inside a loop.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

You could get the data in long format and then calculate the mean

library(dplyr)
library(tidyr)

df %>%
  pivot_longer(cols = starts_with('GP')) %>%
  group_by(GENDER ,LOC, name, value) %>%
  summarise(SCORE = mean(SCORE))

#   GENDER LOC   name  value SCORE
#   <fct>  <fct> <chr> <fct> <dbl>
# 1 F      JP    GP_A  c        85
# 2 F      JP    GP_B  e        80
# 3 F      JP    GP_B  f        90
# 4 F      UK    GP_A  b        30
# 5 F      UK    GP_B  e        30
# 6 M      HK    GP_A  a        60
# 7 M      HK    GP_B  d        60
# 8 M      UK    GP_A  a        80
# 9 M      UK    GP_A  b        20
#10 M      UK    GP_B  e        50

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...