Since version 0.15, the tf-idf score of each feature can be retrieved via the attribute idf_
of the TfidfVectorizer
object:
from sklearn.feature_extraction.text import TfidfVectorizer
corpus = ["This is very strange",
"This is very nice"]
vectorizer = TfidfVectorizer(min_df=1)
X = vectorizer.fit_transform(corpus)
idf = vectorizer.idf_
print dict(zip(vectorizer.get_feature_names(), idf))
Output:
{u'is': 1.0,
u'nice': 1.4054651081081644,
u'strange': 1.4054651081081644,
u'this': 1.0,
u'very': 1.0}
As discussed in the comments, prior to version 0.15, a workaround is to access the attribute idf_
via the supposedly hidden _tfidf
(an instance of TfidfTransformer
) of the vectorizer:
idf = vectorizer._tfidf.idf_
print dict(zip(vectorizer.get_feature_names(), idf))
which should give the same output as above.
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…