According to the source code for sklearn.feature_extraction.text
, the full list (actually a frozenset
, from stop_words
) of ENGLISH_STOP_WORDS
is exposed through __all__
. Therefore if you want to use that list plus some more items, you could do something like:
from sklearn.feature_extraction import text
stop_words = text.ENGLISH_STOP_WORDS.union(my_additional_stop_words)
(where my_additional_stop_words
is any sequence of strings) and use the result as the stop_words
argument. This input to CountVectorizer.__init__
is parsed by _check_stop_list
, which will pass the new frozenset
straight through.
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…