Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
175 views
in Technique[技术] by (71.8m points)

r - Summarize data based on unique ID column

I am trying to summarise multiple columns based on an ID column so I don't double count observations. I have managed to use tapply to get what I need for one variable at a time but can't do this for several variables at the same time.

In addition, the data frame I want to apply this to has +50,000 rows and I want to apply this to +10 different count variables. I was wondering if there is a better solution within dplyr as I ultimately want to create a Shiny Dashboard with this data.

I have replicated a small sample of the data and shown the existing cost.

#Creating data frame
df <- data.frame (ID = c(1, 1, 2, 3, 4, 4, 4),
                  Count = c(1, 1, 30, 15, 1, 1, 1),
                  Count2 = c(1, 1, 20, 10, 1, 1, 1),
                  Service = c("Service A", "Service B", "Service C", "Service D", 
                              "Service E", "Service F", "Service G"))

#Create object of variables to count
myvars <- c("Count", "Count2")

#Count number of unique frequencies for two groups
df %>% 
  group_by(ID) %>%
  summarise(value_sum = sum(tapply(myvars, ID, FUN = max))) %>% 
  summarise(value_sum = sum(value_sum))


#Count number of unique frequencies (code works for one variable at a time)
df %>% 
  group_by(ID) %>%
  summarise(value_sum = sum(tapply(Count, ID, FUN = max))) %>% 
  summarise(value_sum = sum(value_sum))

df %>% 
  group_by(ID) %>%
  summarise(value_sum = sum(tapply(Count2, ID, FUN = max))) %>% 
  summarise(value_sum = sum(value_sum))
question from:https://stackoverflow.com/questions/65884142/summarize-data-based-on-unique-id-column

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

You can use across() to work on multiple variables at the same time within summarise(). In your case:

df %>% 
  group_by(ID) %>% 
  summarise(across(myvars, max)) %>% 
  summarise(across(myvars, sum))

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...