Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
682 views
in Technique[技术] by (71.8m points)

stop words - Stopwords and stemming in Lucene demo

I have two major questions about the Lucene Demo. Does the Lucene demo use stopwords before any modification? What about stemming? If so, what stemmer does it use?

question from:https://stackoverflow.com/questions/65946551/stopwords-and-stemming-in-lucene-demo

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

Which demo are you referring to?

If it's this one, then the answers are:

(a) Stop words: no, it does not. It uses the StandardAnalyzer() which does not use stop words when created with no arguments (but it can, if you choose to provide some).

(b) Stemming: no it does not use stemming - there are no stemming classes involved in the demo code, because there is no stemming used by the standard analyzer.

Take a look at the javadoc for the StandardAnalyzer. You will see the following:

Filters StandardTokenizer with LowerCaseFilter and StopFilter, using a configurable list of stop words.

So, this tells you how your input documents are analyzed:


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...