python - ElasticSearch: EdgeNgrams and Numbers

Question

Welcome To Ask or Share your Answers For Others

python - ElasticSearch: EdgeNgrams and Numbers

posted Oct 24, 2021 in Technique[技术] by 深蓝 (71.8m points)

python - ElasticSearch: EdgeNgrams and Numbers

Any ideas on how EdgeNgram treats numbers?

I'm running haystack with an ElasticSearch backend. I created an indexed field of type EdgeNgram. This field will contain a string that may contain words as well as numbers.

When I run a search against this field using a partial word, it works how it's supposed to. But if I put in a partial number, I'm not getting the result that I want.

Example:

I search for the indexed field "EdgeNgram 12323" by typing "edgen" and I'll get the index returned to me. If I search for that same index by typing "123" I get nothing.

Thoughts?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Reply

深蓝 · Answer 1 · 2021-10-23T21:42:51+0000

I found my way here trying to solve this same problem in Haystack + Elasticsearch. Following the hints from uboness and ComoWhat, I wrote an alternate Haystack engine that (I believe) makes EdgeNGram fields treat numeric strings like words. Others may benefit, so I thought I'd share it.

from haystack.backends.elasticsearch_backend import ElasticsearchSearchEngine, ElasticsearchSearchBackend

class CustomElasticsearchBackend(ElasticsearchSearchBackend):
    """
    The default ElasticsearchSearchBackend settings don't tokenize strings of digits the same way as words, so emplids
    get lost: the lowercase tokenizer is the culprit. Switching to the standard tokenizer and doing the case-
    insensitivity in the filter seems to do the job.
    """
    def __init__(self, connection_alias, **connection_options):
        # see http://stackoverflow.com/questions/13636419/elasticsearch-edgengrams-and-numbers
        self.DEFAULT_SETTINGS['settings']['analysis']['analyzer']['edgengram_analyzer']['tokenizer'] = 'standard'
        self.DEFAULT_SETTINGS['settings']['analysis']['analyzer']['edgengram_analyzer']['filter'].append('lowercase')
        super(CustomElasticsearchBackend, self).__init__(connection_alias, **connection_options)

class CustomElasticsearchSearchEngine(ElasticsearchSearchEngine):
    backend = CustomElasticsearchBackend

Categories

python - ElasticSearch: EdgeNgrams and Numbers

python - ElasticSearch: EdgeNgrams and Numbers

Please log in or register to add a comment.

Please log in or register to reply this article.

1 Reply

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags