Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
445 views
in Technique[技术] by (71.8m points)

r - Count common words in two strings

I have two strings:

a <- "Roy lives in Japan and travels to Africa"
b <- "Roy travels Africa with this wife"

I am looking to get a count of common words between these strings.

The answer should be 3.

  • "Roy"

  • "travels"

  • "Africa"

being the common words

This is what I tried:

stra <- as.data.frame(t(read.table(textConnection(a), sep = " ")))
strb <- as.data.frame(t(read.table(textConnection(b), sep = " ")))

Taking unique to avoid repeat counting

stra_unique <-as.data.frame(unique(stra$V1))
strb_unique <- as.data.frame(unique(strb$V1))
colnames(stra_unique) <- c("V1")
colnames(strb_unique) <- c("V1")

common_words <-length(merge(stra_unique,strb_unique, by = "V1")$V1)

I need to this for a data set with over 2000 and 1200 strings. Total times I have to evaluate the string is 2000 X 1200. Any quick way, without using loops.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

You can use strsplit and intersect from the base library:

> a <- "Roy lives in Japan and travels to Africa"
> b <- "Roy travels Africa with this wife"
> a_split <- unlist(strsplit(a, sep=" "))
> b_split <- unlist(strsplit(b, sep=" "))
> length(intersect(a_split, b_split))
[1] 3

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...