When querying a search index in the Python version of the GAE Search API, what is the best practice for searching for items where documents with words match the title are first returned, and then documents where words match the body?
For example given:
body = """This is the body of the document,
with a set of words"""
my_document = search.Document(
fields=[
search.TextField(name='title', value='A Set Of Words'),
search.TextField(name='body', value=body),
])
If it is possible, how might one perform a search on an index of Document
s of the above form with results returned in this priority, where the phrase being searched for is in the variable qs
:
- Documents whose
title
matches the qs
; then
- Documents whose body match the
qs
words.
It seems like the correct solution is to use a MatchScorer
, but I may be off the mark on this as I have not used this search functionality before. It is not clear from the documentation how to use the MatchScorer
, but I presume one subclasses it and overloads some function - but as this is not documented, and I have not delved into the code, I cannot say for sure.
Is there something here that I am missing, or is this the correct strategy? Did I miss where this sort of thing is documented?
Just for clarity here is a more elaborate example of the desired outcome:
documents = [
dict(title="Alpha", body="A"), # "Alpha"
dict(title="Beta", body="B Two"), # "Beta"
dict(title="Alpha Two", body="A"), # "Alpha2"
]
for doc in documents:
search.Document(
fields=[
search.TextField(name="title", value=doc.title),
search.TextField(name="body", value=doc.body),
]
)
index.put(doc) # for some search.Index
# Then when we search, we search the Title and Body.
index.search("Alpha")
# returns [Alpha, Alpha2]
# Results where the search is found in the Title are given higher weight.
index.search("Two")
# returns [Alpha2, Beta] -- note Alpha2 has 'Two' in the title.
See Question&Answers more detail:
os 与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…