Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
373 views
in Technique[技术] by (71.8m points)

r - Using anti_join() from the dplyr on two tables from two different databases

I am working on a ETL testing project, where my need is to compare data between two tables from two different databases. to do this, I first downloaded entire tables using query like below.

  query_table_a <- paste0("SELECT * FROM   MBR_MEAS (NOLOCK)")
  table_a <- as.data.frame(sqlQuery(cn, query_table_a))

Then, I used anti_join() from the dplyr. If the column name is same in both data frames, then my result is good. for example(this returns good and expected results)

mismatch_records <- anti_join(table_a, table_b, by="client_id")

But in another scenario, column name is changed (table 'c' has column name as client_id and table 'd' has clientid, I couldn't figure out what to do. I tried using merge function but that doesn't seems to be very promising.

merge(x = table_c, y = table_d, by.x ="CLIENT_ID", by.y = "ClientId", all.x = "TRUE")

any suggestions please?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

Try this:

mismatch_records <- anti_join(table_c, table_d, by = c("CLIENT_ID" = "ClientId"))

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...