Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
434 views
in Technique[技术] by (71.8m points)

Select Rows from Different Table where String from 1st table column is present in R

I am trying to match tables if a string is fully present in the other tables' column. However, I have managed to join it partially and then I am applying Levenstein distance to get close matches. This approach has limited use and accuracy. Approach:

checkg <- check %>% 
  fuzzy_inner_join(LOCATIONS, by = c("STRING" = "STRING"), match_fun = str_detect) %>%
  rowwise() %>%
  mutate(DIST = adist(x=STRING, y=LOCATION, ignore.case = TRUE)) 

is there any way to map it in the following way? The STATUS column in the output table is just given to make it clear that partial string matching is not the objective. It is not required in the output. Thanks

TABLE 1

**STRING** 
BATANGAS
QINGDAO

TABLE2

**STRING**
BATNAGAS LUZON
QINGDAO PT

OUTPUT TABLE checkg

TABLE1.STRING   TABLE2.STRING    STATUS
BATANGAS        BATNAGAS LUZON   Accept
QINGDAO         QINGDAO PT       Accept
BATANGAS        TANGA            Reject
question from:https://stackoverflow.com/questions/65901438/select-rows-from-different-table-where-string-from-1st-table-column-is-present-i

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

You can reverse the syntax to avoid partial matching from LOCATIONS table.

library(fuzzyjoin)

check <- data.frame(STRING = c("BATANGAS", "QINGDAO"))
LOCATIONS <- data.frame(STRING = c("BATANGAS LUZON", "QINGDAO PT", "TANGA"))

LOCATIONS %>% 
  fuzzy_right_join(check, by = c("STRING" = "STRING"), match_fun = str_detect)

        STRING.x STRING.y
1 BATANGAS LUZON BATANGAS
2     QINGDAO PT  QINGDAO

To check further for full words only, you can do this..

check <- structure(list(To_check = c("BATANGAS", "QINGDAO", "ABC", "DEF"
), id = 1:4), class = "data.frame", row.names = c(NA, -4L))

check
> check
  To_check id
1 BATANGAS  1
2  QINGDAO  2
3      ABC  3
4      DEF  4

> LOCATIONS
          STRING
1 BATANGAS LUZON
2     QINGDAO PT
3          TANGA
4           ABCD

LOCATIONS %>% 
  fuzzy_right_join(check %>% mutate(dummy = paste0('\b', To_check, '\b')), 
                   by = c("STRING" = "dummy"), match_fun = str_detect) %>%
  select(-dummy)

          STRING To_check id
1 BATANGAS LUZON BATANGAS  1
2     QINGDAO PT  QINGDAO  2
3           <NA>      ABC  3
4           <NA>      DEF  4

needless to say you can use fuzzy_inner_join for having matched results only


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...