Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
355 views
in Technique[技术] by (71.8m points)

r - Flattening a delimited composite column

I got a data frame in R where one of the fields is composite (delimited). Here's an example of what I got:

users=c(1,2,3)
items=c("23 77 49", "10 18 28", "20 31 84")
df = data.frame(users,items)

(I don't build it; this is just for illustrative purposes.)

  users    items
  1        23 77 49
  2        10 18 28
  3        20 31 84

I want to flatten the second column in order to have a list of (non-unique) user IDs and an individual item per row. So I want to end up with:

user   item
1        23
1        77
1        49
2        10
2        18
2        28
3        20
3        31
3        84

I tried:

data.frame(user = df$users, item = unlist(strsplit(as.character(df$items), " "))) 

But I get "arguments imply differing number of rows". I understand why, but can't find a solution to give me the result I want. Any ideas?

Also, what is the most efficient way as I got more than 20 million rows of this?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)
items <- strsplit(df$items, " ")
data.frame(user = rep(df$users, sapply(items, length)), item = unlist(items))

##   user item                                                                                                                                                                                                                                
## 1    1   23                                                                                                                                                                                                                                
## 2    1   77                                                                                                                                                                                                                                
## 3    1   49                                                                                                                                                                                                                                
## 4    2   10                                                                                                                                                                                                                                
## 5    2   18                                                                                                                                                                                                                                
## 6    2   28                                                                                                                                                                                                                                
## 7    3   20                                                                                                                                                                                                                                
## 8    3   31                                                                                                                                                                                                                                
## 9    3   84  

or

library(data.table)

DT <- data.table(df)    
DT[, list(item = unlist(strsplit(items, " "))), by = users]

##    users item                                                                                                                                                                                                                              
## 1:     1   23                                                                                                                                                                                                                              
## 2:     1   77                                                                                                                                                                                                                              
## 3:     1   49                                                                                                                                                                                                                              
## 4:     2   10                                                                                                                                                                                                                              
## 5:     2   18                                                                                                                                                                                                                              
## 6:     2   28                                                                                                                                                                                                                              
## 7:     3   20                                                                                                                                                                                                                              
## 8:     3   31                                                                                                                                                                                                                              
## 9:     3   84 

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...