Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
340 views
in Technique[技术] by (71.8m points)

r - Join two data tables and use only one column from second dt

Let's say I have two data tables (dt1 and dt2), and I want to get dt3 using data tables. A,B,C,E,F,G,H are column names. dt1 key is column A, and dt2 key is column E. Data tables have different number of rows. I want to keep all the columns from DT1, and add only one column (H) from DT2 to the joined data table. Eventually, I will store this as DT1 (though I showed it as dt3 below).

How can I achieve it with data tables? I have an ugly solution with merge + data frames.

dt1 
A   B   C   
1   4   7   
2   5   8   
3   6   9   
2   20  21

dt2
E   F   G   H
1   10  13  16
3   12  15  18    
2   11  14  17


dt3
A   B   C   H
1   4   7   16
2   5   8   17
3   6   9   18
2   20  21  17          
See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

In order to perform a left join to df1 and add H column from df2, you can combine binary join with the update by reference operator (:=)

setkey(setDT(dt1), A) 
dt1[dt2, H := i.H]

See here and here for detailed explanation on how it works


With the devel version (v >= 1.9.5) we could make it even shorter by specifying the key within setDT (as pointed by @Arun)

setDT(dt1, key = "A")[dt2, H := i.H]

Edit 24/7/2015

You can now run a binary join using the new on parameter without setting keys

setDT(dt1)[dt2, H := i.H, on = c(A = "E")]

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...