strsplit - R: I have to do Softmatch in String

Question

Welcome To Ask or Share your Answers For Others

strsplit - R: I have to do Softmatch in String

posted Oct 24, 2021 in Technique[技术] by 深蓝 (71.8m points)

strsplit - R: I have to do Softmatch in String

I have to do softmatch in one column of data frame with the given input string, like

col <- c("John Collingson","J Collingson","Dummy Name1","Dummy Name2")

inputText <- "J Collingson"
#Vice-Versa
inputText <- "John Collingson"

I want to retrieve both "John Collingson" & "J Collingson" from the provided colname "col"

Kindly help

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Reply

深蓝 · Answer 1 · 2021-10-23T19:25:02+0000

agrep is definitely a quick and easy base R solution if you have just a bit of data. If this is just a toy example of a larger data frame, you may be interested in a more durable tool. In the past month, learning about the Levenshtein distance noted by @PaulHiemstra (also in these different questions) led me to the RecordLinkage package. The vignettes leave me wanting more examples of the "soft" or fuzzy" matches, particularly across more than 1 field, but the basic answer to your question could be somthing like:

library(RecordLinkage)
col <- data.frame(names1 = c("John Collingson","J Collingson","Dummy Name1","Dummy Name2"))
inputText <- data.frame(names2 = c("J Collingson"))
g1 <- compare.linkage(inputText, col, strcmp = T)
g2 <- epiWeights(g1)
getPairs(g2, min.weight=0.6) 
# id          names2 Weight
# 1  1    J Collingson       
# 2  2    J Collingson  1.000
# 3                          
# 4  1    J Collingson       
# 5  1 John Collingson  0.815

inputText2 <- data.frame(names2 = c("Jon Collinson"))
g1 <- compare.linkage(inputText2, col, strcmp = T)
g2 <- epiWeights(g1)
getPairs(g2, min.weight=0.6)
# id          names2    Weight
# 1  1   Jon Collinson          
# 2  1 John Collingson 0.9644444
# 3                             
# 4  1   Jon Collinson          
# 5  2    J Collingson 0.7924825

Please start with compare.linkage() or compare.dedup()-- RLBigDataLinkage() or RLBigDataDedup() for large data sets. Hope this helps.

Categories

strsplit - R: I have to do Softmatch in String

strsplit - R: I have to do Softmatch in String

Please log in or register to add a comment.

Please log in or register to reply this article.

1 Reply

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags