I have been busy with this question since last night and I could not figure out how to do it.
What I want to do is to match df1 strings to df2 strings and get the similar ones out
what I do is like this
# a function to arrange the data to have IDs for each string
normalize <- function(x, delim) {
x <- gsub(")", "", x, fixed=TRUE)
x <- gsub("(", "", x, fixed=TRUE)
idx <- rep(seq_len(length(x)), times=nchar(gsub(sprintf("[^%s]",delim), "", as.character(x)))+1)
names <- unlist(strsplit(as.character(x), delim))
return(setNames(idx, names))
}
# a function to arrange the second df
lookup <- normalize(df2[,1], ",")
# a function to match them and give the IDs
process <- function(s) {
lookup_try <- lookup[names(s)]
found <- which(!is.na(lookup_try))
pos <- lookup_try[names(s)[found]]
return(paste(s[found], pos, sep="-"))
#change the last line to "return(as.character(pos))" to get only the result as in the comment
}
then I get the results like this
res <- lapply(colnames(df1), function(x) process(normalize(df1[,x], ";")))
This gives me the row number of each string from df1 and row number of string from df2 that matched. so the output of this data looks like this
> res
$s1
[1] "3-4" "4-1" "5-4"
$s2
[1] "2-4" "3-15" "7-16"
The first column IDs is the row number of df2 which matched with strings in df1
The second column No is the number of times it matched
The third column ID-col-n is the row number of string in df1 which matched with that string + their column name
the forth is string from first column of the df1 which matched with that string
the fifth column is the string of second column which matched with that string
and so on
See Question&Answers more detail:
os 与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…