Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
414 views
in Technique[技术] by (71.8m points)

r - Data.table - left outer join on multiple tables

Suppose you have data like

fruits <- data.table(FruitID=c(1,2,3), Fruit=c("Apple", "Banana", "Strawberry"))
colors <- data.table(ColorID=c(1,2,3,4,5), FruitID=c(1,1,1,2,3), Color=c("Red","Yellow","Green","Yellow","Red"))
tastes <- data.table(TasteID=c(1,2,3), FruitID=c(1,1,3), Taste=c("Sweeet", "Sour", "Sweet"))

setkey(fruits, "FruitID")
setkey(colors, "ColorID")
setkey(tastes, "TasteID")

fruits
   FruitID      Fruit
1:       1      Apple
2:       2     Banana
3:       3 Strawberry

colors
   ColorID FruitID  Color
1:       1       1    Red
2:       2       1 Yellow
3:       3       1  Green
4:       4       2 Yellow
5:       5       3    Red

tastes
   TasteID FruitID  Taste
1:       1       1 Sweeet
2:       2       1   Sour
3:       3       3  Sweet

I typically need to perform left-outer joins on data like this. For instance, "give me all fruits and their colors" requires me to write (and maybe there's a better way?)

setkey(colors, "FruitID")
result <- colors[fruits, allow.cartesian=TRUE]
setkey(colors, "ColorID")

Three lines of code for such a simple and frequent task seemed excessive, so I wrote a method myLeftJoin

myLeftJoin <- function(tbl1, tbl2){
  # Performs a left join using the key in tbl1 (i.e. keeps all rows from tbl1 and only matching rows from tbl2)

  oldkey <- key(tbl2)
  setkeyv(tbl2, key(tbl1))
  result <- tbl2[tbl1, allow.cartesian=TRUE]
  setkeyv(tbl2, oldkey)
  return(result)
}

which I can use like

myLeftJoin(fruits, colors)
   ColorID FruitID  Color      Fruit
1:       1       1    Red      Apple
2:       2       1 Yellow      Apple
3:       3       1  Green      Apple
4:       4       2 Yellow     Banana
5:       5       3    Red Strawberry

How can I extend this method so that I can pass any number of tables to it and get the chained left outer join of all of them? Something like myLeftJoin(tbl1, ...)

For instance, I'd like the result of myleftJoin(fruits, colors, tastes) to be equivalent to

setkey(colors, "FruitID")
setkey(tastes, "FruitID")
result <- tastes[colors[fruits, allow.cartesian=TRUE], allow.cartesian=TRUE]
setkey(tastes, "TasteID")
setkey(colors, "ColorID")

result
   TasteID FruitID  Taste ColorID  Color      Fruit
1:       1       1 Sweeet       1    Red      Apple
2:       2       1   Sour       1    Red      Apple
3:       1       1 Sweeet       2 Yellow      Apple
4:       2       1   Sour       2 Yellow      Apple
5:       1       1 Sweeet       3  Green      Apple
6:       2       1   Sour       3  Green      Apple
7:      NA       2     NA       4 Yellow     Banana
8:       3       3  Sweet       5    Red Strawberry

Perhaps there's an elegant solution using methods in the data.table package that I missed? Thanks

(EDIT: Fixed a mistake in my data)

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

I just committed a new feature in data.table, v1.9.5, with which we can join without setting keys (that is, specify the columns to join by directly, without having to use setkey() first):

With that, this is simply:

require(data.table) # v1.9.5+
fruits[tastes, on="FruitID"][colors, on="FruitID"] # no setkey required
#    FruitID      Fruit TasteID  Taste ColorID  Color
# 1:       1      Apple       1 Sweeet       1    Red
# 2:       1      Apple       2   Sour       1    Red
# 3:       1      Apple       1 Sweeet       2 Yellow
# 4:       1      Apple       2   Sour       2 Yellow
# 5:       1      Apple       1 Sweeet       3  Green
# 6:       1      Apple       2   Sour       3  Green
# 7:       2         NA      NA     NA       4 Yellow
# 8:       3 Strawberry       3  Sweet       5    Red

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...