I have two datasets that I would like to merge by a common ID variable. One file (df1) has 1.4mil observations while the other (df2) has 20k. Both contain an ID number and what I would like to do is attach all additional variables, or, columns found in df2 to the rows in df1 where client ID vars match, leaving everything else as is. Ideally, I would just like to paste these to the rows in df1 so that ideally remaining rows in my 1.4mil dataset are simply, missing. Something like this
#> ID_Num Var1 ID_Num Var1 Var2
#> 1 23124 252 23124 252 3
#> 2 12312 161 12312 161 2
#> 2 1233 161 12333 161
#> 2 12345 161 12345 161 5
I have used dplyr's inner_join command but it always adds many many obs. for example when I use it for df 1 = 1.4mil d42 = 20k
df3 <- innerjoin(df1,df2)
df3 ends up being 4.1 mil. I am very confused by this.
If someone could help I would greatly appreciate it!
question from:
https://stackoverflow.com/questions/66067440/attaching-unequal-rows-of-2-datasets-in-r 与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…