Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
527 views
in Technique[技术] by (71.8m points)

r - paste two data.table columns

dt <- data.table(L=1:5,A=letters[7:11],B=letters[12:16])
   L A B
1: 1 g l
2: 2 h m
3: 3 i n
4: 4 j o
5: 5 k p

Now I want to paste columns "A" and "B" to get a new one, let's call it "new":

dt2
   L A B new
1: 1 g l  gl
2: 2 h m  hm
3: 3 i n  in
4: 4 j o  jo
5: 5 k p  kp
See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

I had a similar issue but had many columns, and didn't want to type them each manually.

New version

(based on comment from @mnel)

dt[, new:=do.call(paste0,.SD), .SDcols=-1]

This is roughly twice as fast as the old version, and seems to sidestep the quirks. Note the use of .SDcols to identify the columns to use in paste0. The -1 uses all columns but the first, since the OP wanted to paste columns A and B but not L.

If you would like to use a different separator:

dt[ , new := do.call(paste, c(.SD, sep = ":"))]

Old version

You can use .SD and by to handle multiple columns:

dt[,new:=paste0(.SD,collapse=""),by=seq_along(L)]

I added seq_along in case L was not unique. (You can check this using dt<-data.table(L=c(1:4,4),A=letters[7:11],B=letters[12:16])).

Also, in my actual instance for some reason I had to use t(.SD) in the paste0 part. There may be other similar quirks.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...