I have been using findAssoc()
with textmining (tm
package) but realized that something doesn't seem right with my dataset.
My dataset is 1500 open ended answers saved in one column of csv file.
So I called the dataset like this and used typical tm_map
to make it to corpus.
library(tm)
Q29 <- read.csv("favoritegame2.csv")
corpus <- Corpus(VectorSource(Q29$Q29))
corpus <- tm_map(corpus, tolower)
corpus <- tm_map(corpus, removePunctuation)
corpus <- tm_map(corpus, removeNumbers)
corpus<- tm_map(corpus, removeWords, stopwords("english"))
dtm<- DocumentTermMatrix(corpus)
findAssocs(dtm, "like", .2)
> cousin fill ....
0.28 0.20
Q1. When I find Terms associated with like
, I don't see the output like = 1
as part of the output. However,
dtm.df <-as.data.frame(inspect(dtm))
this dataframe consists of 1500 obs. of 1689 variables..(Or is it because the data is save in a row of csv file?)
Q2. Even though cousin
and fill
showed up once when the target term like
showed up once, the score is different like this. Shouldn't they be same?
I'm trying to find the math of findAssoc()
but no success yet. Any advice is highly appreciated!
See Question&Answers more detail:
os 与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…