Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
1.4k views
in Technique[技术] by (71.8m points)

nlp - R tm package stemCompletion returns stems and NA instead of completed stems

I'm having an issue with stemCompletion. Here is a reproducible example.

library("tm")
library("SnowballC")
text <- "communicate. communicate Communicates communicating Communication 1"
corpus <- Corpus(VectorSource(text))
toSpace <- content_transformer(function (x, pattern) gsub(pattern, " ", x))
corpus <- tm_map(corpus, toSpace, "/")
corpus <- tm_map(corpus, toSpace, "@")
corpus <- tm_map(corpus, toSpace, "\|")
corpus <- tm_map(corpus, content_transformer(tolower))
corpus <- tm_map(corpus, content_transformer(removeNumbers))
corpus <- tm_map(corpus, removeWords, stopwords("english"))
corpus <- tm_map(corpus, content_transformer(removePunctuation))
corpus <- tm_map(corpus, content_transformer(stripWhitespace))
dictionary <- corpus # save this to use as a dictionary for stemCompletion
stemmed_corpus <- tm_map(corpus, content_transformer(stemDocument), language="english")
stemmed_corpus[[1]][1] # confirm words are stemmed properly
dictionary[[1]][1] # confirm the dictionary has complete words

# this is the part that's not working as expected:
completed_corpus <- tm_map(stemmed_corpus, content_transformer(stemCompletion), dictionary=dictionary, type=c("prevalent"))
inspect(completed_corpus)

inspect(completed_corpus) returns the stem ("communic") five times and one NA value.

What I'm aiming to get is the completed stems ("communicate" five times).

Thanks in advance for any suggestions.

question from:https://stackoverflow.com/questions/65924820/r-tm-package-stemcompletion-returns-stems-and-na-instead-of-completed-stems

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)
Waitting for answers

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...