I am trying to understand a little more about the way rolling joins work and am having some confusion, I was hoping somebody could clarify this for me. To take a concrete example:
dt1 <- data.table(id=rep(1:5, 10), t=1:50, val1=1:50, key="id,t")
dt2 <- data.table(id=rep(1:5, 2), t=1:10, val2=1:10, key="id,t")
I expected this to produce a long data.table
where the values in dt2
are rolled:
dt1[dt2,roll=TRUE]
Instead, the correct way to do this seems to be:
dt2[dt1,roll=TRUE]
Could someone explain to me more about how joining in data.table
works as I am clearly not understanding it correctly. I thought that dt1[dt2,roll=TRUE]
corresponded to the sql equivalent of select * from dt1 right join dt2 on (dt1.id = dt2.id and dt1.t = dt2.t)
, except with the added functionality locf.
Additionally the documentation says:
X[Y] is a join, looking up X's rows using Y (or Y's key if it has one)
as an index.
This makes it seem that only things in X should be returned an the join being done is an inner join, not outer. What about in the case when roll=T
but that particular id
does not exist in dt1
? Playing around a bit more I can't understand what value is being placed into the column.
See Question&Answers more detail:
os 与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…