I have two dataframes that I want to merge by a common variable. Some observations appear in one data but not in the other, and vice verse. When merging, I want to keep only the observations that appear in both dataframes, and therefore dplyr::inner_join()
is appropriate.
However, there's an exclusion. There are some observations that I'd like to include in the merged data no matter what. That is, even if they don't appear in both original dataframes. The indication to which ones are those "special" observations to keep is given as certain values in certain columns.
Example
I want to merge the following dataframes (df_population
and df_gdp
)
1. df_population
library(tibble)
library(dplyr)
## helper function
myFun <- function(n = 5000) {
a <- do.call(paste0, replicate(5, sample(LETTERS, n, TRUE), FALSE))
paste0(a, sprintf("%04d", sample(9999, n, TRUE)), sample(LETTERS, n, TRUE))
}
set.seed(2021)
df_population <-
tribble(~country, ~population,
"australia", 24.99,
"united_kingdom", 66.65,
"france", 66.99,
"spain", 46.94,
"canada", 37.59,
"brazil", 209.5) %>%
mutate(col_of_strings = c(myFun(5), "dont_leave_me_behind"))
## # A tibble: 6 x 3
## country population col_of_strings
## <chr> <dbl> <chr>
## 1 australia 25.0 GLNWN9968R
## 2 united_kingdom 66.6 FTELH3426F
## 3 france 67.0 NFOSZ6335V
## 4 spain 46.9 ZFGRD8875F
## 5 canada 37.6 GFICE2875O
## 6 brazil 210. dont_leave_me_behind
2. df_gdp
df_gdp <-
tribble(~country, ~gdp_growth,
"australia", 2.9,
"united_kingdom", 1.4,
"france", 1.7,
"spain", 2.4,
"canada", 1.9,
"greece", 1.9) %>%
mutate(col_of_strings = sample(c(myFun(5), "dont_leave_me_behind")))
## # A tibble: 6 x 3
## country gdp_growth col_of_strings
## <chr> <dbl> <chr>
## 1 australia 2.9 dont_leave_me_behind
## 2 united_kingdom 1.4 RQHHI9679V
## 3 france 1.7 PFSZX1552L
## 4 spain 2.4 BQTBY7537E
## 5 canada 1.9 OECIK9698V
## 6 greece 1.9 VXDQQ4718J
My Problem
Normally I'd go with
dplyr::inner_join(df_population, df_gdp, by = "country")
But:
Although I want only the countries that are common to both dataframes, I still want to include any country that has col_of_strings == dont_leave_me_behind
I hope there's a simple solution to this. Thanks!