I am trying to summarise multiple columns based on an ID column so I don't double count observations. I have managed to use tapply
to get what I need for one variable at a time but can't do this for several variables at the same time.
In addition, the data frame I want to apply this to has +50,000 rows and I want to apply this to +10 different count variables. I was wondering if there is a better solution within dplyr
as I ultimately want to create a Shiny Dashboard with this data.
I have replicated a small sample of the data and shown the existing cost.
#Creating data frame
df <- data.frame (ID = c(1, 1, 2, 3, 4, 4, 4),
Count = c(1, 1, 30, 15, 1, 1, 1),
Count2 = c(1, 1, 20, 10, 1, 1, 1),
Service = c("Service A", "Service B", "Service C", "Service D",
"Service E", "Service F", "Service G"))
#Create object of variables to count
myvars <- c("Count", "Count2")
#Count number of unique frequencies for two groups
df %>%
group_by(ID) %>%
summarise(value_sum = sum(tapply(myvars, ID, FUN = max))) %>%
summarise(value_sum = sum(value_sum))
#Count number of unique frequencies (code works for one variable at a time)
df %>%
group_by(ID) %>%
summarise(value_sum = sum(tapply(Count, ID, FUN = max))) %>%
summarise(value_sum = sum(value_sum))
df %>%
group_by(ID) %>%
summarise(value_sum = sum(tapply(Count2, ID, FUN = max))) %>%
summarise(value_sum = sum(value_sum))
question from:
https://stackoverflow.com/questions/65884142/summarize-data-based-on-unique-id-column 与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…