Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
686 views
in Technique[技术] by (71.8m points)

r - Making gsub only replace entire words?

(I'm using R.) For a list of words that's called "goodwords.corpus", I am looping through the documents in a corpus, and replacing each of the words on the list "goodwords.corpus" with the word + a number.

So for example if the word "good" is on the list, and "goodnight" is NOT on the list, then this document:

I am having a good time goodnight

would turn into:

I am having a good 1234 time goodnight

**I'm using this code (EDIT- made this reproducible):

goodwords.corpus <- c("good")
test <- "I am having a good time goodnight"
for (i in 1:length(goodwords.corpus)){
test <-gsub(goodwords.corpus[[i]], paste(goodwords.corpus[[i]], "1234"), test)
}

However, the problem is I want gsub to only replace ENTIRE words. The issue that arises is that: "good" is on the "goodwords.corpus" list, but then "goodnight", which is NOT on the list, is also affected. So I get this:

I am having a good 1234 time good 1234night

Is there anyway I can tell gsub to only replace ENTIRE words, and not words that might be a part of other words?

I want to use this:

test <-gsub("\<goodwords.corpus[[i]]\>", paste(goodwords.corpus[[i]], "1234"), test)
}

I've read that the < and > will tell gsub to only look for whole words. But obviously that doesn't work, because goodwords.corpus[[i]] won't work when it's in quotes.

Any suggestions?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

Use to indicate a word boundary:

> text <- "good night goodnight"
> gsub("\bgood\b", paste("good", 1234), text)
[1] "good 1234 night goodnight"

In your loop, something like this:

for (word in goodwords.corpus){
  patt <- paste0('\b', word, '\b')
  repl <- paste(word, "1234")

  test <-gsub(patt, repl, test)
}

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...