Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
1.2k views
in Technique[技术] by (71.8m points)

python - Elasticsearch Hybrid Query - Always returning a score of 0

I’m currently trying to do a hybrid search on two indexes: a full text index and knn_vector (word embeddings) index. Currently, over 10’000 documents from Wikipedia are indexed on an ES stack, indexed on both of these fields (see mapping: “content”, “embeddings”). The queries are well known n-grams (1,2,3) that should yield results (words are taken from the wikipedia pages that are indexed).

It is also important to note that the knn_vector index is defined as a nested object.

This is the current mapping of the items indexed:

mapping = {
        "settings": {
            "index": {
                "knn": True,
                "knn.space_type": "cosinesimil"
            }
        },
       "mappings": {
        "dynamic": 'strict', 
        "properties": {
            "elasticId": 
                { 'type': 'text' },
            "owners": 
                { 'type': 'text' },
            "type": 
                { 'type': 'keyword' },
            "accessLink": 
                { 'type': 'keyword' },
            "content": 
                { 'type': 'text'}, 
    "embeddings": {
                'type': 'nested', 
                "properties": {
                  "vector": {
                    "type": "knn_vector", 
                    "dimension": VECTOR_DIM, 
                          },
                    },
    },
}

My goal is to compare the query scores on both indexes to understand if one is more efficient than the other (full text vs. knn_vectors), and how elastic chooses to return an object from based on the score of each index.

I understand I could simply split the queries (two separate queries), but ideally, we might want to use a hybrid search of this type in production.

This is the current query that searches on both full text and the knn_vectors:

def MakeHybridSearch(query):
    query_vector = convert_to_embeddings(query)
    result = elastic.search({
        "explain": True, 
        "profile": True, 
        "size": 2,
        "query": {
        "function_score": { #function_score
        "functions": [
            {
          "filter": { 
              "match": { 
                  "text": {
                      "query": query,
                      'boost': "5",  
                      }, 
                    }, 
                  },
            "weight": 2
          },
          {
          "filter": { 
              'script': {
                'source': 'knn_score',
                'params': {
                  'field': 'doc_vector',
                  'vector': query_vector,
                  'space_type': "l2"
                      }
                  }
                  },
                  "weight": 4
              }
          ],
          "max_boost": 5,
          "score_mode": "replace",
          "boost_mode": "multiply",
          "min_score": 5
          }
        }
      }, index='files_en', size=1000)

 

The current problem is that all queries are not returning anything. Result:

{
"took": 3,
"timed_out": false,
"_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
},
"hits": {
    "total": {
        "value": 0,
        "relation": "eq"
    },
    "max_score": null,
    "hits": []
},

Even when the query does return a response, it returns hits with a score of 0 (score =0).

Is there an error in the query structure ? Could this be on the mapping side ? If not, is there a better of way of doing this ?

Thank you for your help !


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)
等待大神答复

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...