You have to do a little bit of a song and dance to get the matrices as numpy arrays instead, but this should do what you're looking for:
feature_array = np.array(tfidf.get_feature_names())
tfidf_sorting = np.argsort(response.toarray()).flatten()[::-1]
n = 3
top_n = feature_array[tfidf_sorting][:n]
This gives me:
array([u'fruit', u'travellers', u'jupiter'],
dtype='<U13')
The argsort
call is really the useful one, here are the docs for it. We have to do [::-1]
because argsort
only supports sorting small to large. We call flatten
to reduce the dimensions to 1d so that the sorted indices can be used to index the 1d feature array. Note that including the call to flatten
will only work if you're testing one document at at time.
Also, on another note, did you mean something like tfs = tfidf.fit_transform(t.split("
"))
? Otherwise, each term in the multiline string is being treated as a "document". Using
instead means that we are actually looking at 4 documents (one for each line), which makes more sense when you think about tfidf.
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…