I've used various versions of TFIDF in scikit learn to model some text data.
vectorizer = TfidfVectorizer(min_df=1,stop_words='english')
The resulting data X is in this format:
<rowsxcolumns sparse matrix of type '<type 'numpy.float64'>'
with xyz stored elements in Compressed Sparse Row format>
I wanted to experiment with LDA as a way to do reduce dimensionality of my sparse matrix.
Is there a simple way to feed the NumPy sparse matrix X into a gensim LDA model?
lda = models.ldamodel.LdaModel(corpus=corpus, id2word=dictionary, num_topics=100)
I can ignore scikit and go the way the gensim tutorial outlines, but I like the simplicity of the scikit vectorizers and all of its parameters.
See Question&Answers more detail:
os 与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…