Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
337 views
in Technique[技术] by (71.8m points)

r - Group by columns and summarize a column into a list

I have a dataframe like this:

sample_df<-data.frame(
   client=c('John', 'John','Mary','Mary'),
   date=c('2016-07-13','2016-07-13','2016-07-13','2016-07-13'),
   cluster=c('A','B','A','A'))

#sample data frame
   client date         cluster
1  John   2016-07-13    A 
2  John   2016-07-13    B 
3  Mary   2016-07-13    A 
4  Mary   2016-07-13    A             

I would like to transform it into different format, which will be like:

#ideal data frame
   client date         cluster
1  John   2016-07-13    c('A,'B') 
2  Mary   2016-07-13    A 

For the 'cluster' column, it will be a list if some client is belong to different cluster on the same date.

I thought I can do it with dplyr package with commend as below

library(dplyr)
ideal_df<-sample %>% 
    group_by(client, date) %>% 
    summarize( #some anonymous function)

However, I don't know how to write the anonymous function in this situation. Is there a way to transform the data into the ideal format?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

We can use toString to concat the unique elements in 'cluster' together after grouping by 'client'

r1 <- sample_df %>% 
         group_by(client, date) %>%
         summarise(cluster = toString(unique(cluster)))

Or another option would be to create a list column

r2 <- sample_df %>%
         group_by(client, date) %>% 
         summarise(cluster = list(unique(cluster)))

which we can unnest

library(tidyr)
r2 %>%
    ungroup %>%
     unnest()

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...