I would like to plot a 2d graph with the x-axis as term and y-axis as TFIDF score (or document id) for my list of sentences. I used scikit learn's fit_transform() to get the scipy matrix but i do not know how to use that matrix to plot the graph. I am trying to get a plot to see how well my sentences can be classified using kmeans.
Here is the output of fit_transform(sentence_list)
:
(document id, term number) tfidf score
(0, 1023) 0.209291711271
(0, 924) 0.174405532933
(0, 914) 0.174405532933
(0, 821) 0.15579574484
(0, 770) 0.174405532933
(0, 763) 0.159719994016
(0, 689) 0.135518787598
Here is my code:
sentence_list=["Hi how are you", "Good morning" ...]
vectorizer=TfidfVectorizer(min_df=1, stop_words='english', decode_error='ignore')
vectorized=vectorizer.fit_transform(sentence_list)
num_samples, num_features=vectorized.shape
print "num_samples: %d, num_features: %d" %(num_samples,num_features)
num_clusters=10
km=KMeans(n_clusters=num_clusters, init='k-means++',n_init=10, verbose=1)
km.fit(vectorized)
PRINT km.labels_ # Returns a list of clusters ranging 0 to 10
Thanks,
See Question&Answers more detail:
os 与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…