In ElasticSearch 2.3 (and in the latest releases) there is a index.max_result_window setting which restricts the search query to a from
+ size
value that is less than 10,000 entries. e.g.
from: 0 size: 10,000 is ok
from: 0 size: 10,001 is not ok
from: 9,000 size: 1,001 is not ok
In the latest release, 7.10, the documentation says this can be worked around by using search-after. However, due to legacy data, I need something similar in ES 2.3. I'm curious if there are any good options?
Why do I need this? In our data we've a child / parent hierarchy. One query we run against this data is to determine all the unique parents over a certain date range. Currently we retrieve this information using an aggregate
query. i.e.
{
"query": { "match_all_in_date_range": {} },
"aggs": {
"parents": {
"terms": {
"field": "parentId"
}
}
}
}
Which, interestingly, returns all the parents even if there are more than 10,000. i.e. It does not appear to be affected by the index.max_result_window
limit.
But this aggregation is expensive and time consuming. As a result I'm evaluating if it's possible to remove it and "aggregate" the data in our own code. i.e. Retrieve all the objects, read their parentId
field, and record the unique ids.
But it looks like the index.max_result_window
limit may break that idea. i.e. Unless I'm mistaken. Two ideas I had to work around this would be
- Rather than paging I should modify the query to exclude the
parentIds
I've already retrieved (the downside being that it could take longer to run and will cause the query to grow until the end)
- To move over to the more heavy duty scroll API (which may be more suitable for other usages)
But I'd be curious to hear if there are other options available to me?
question from:
https://stackoverflow.com/questions/65908713/elasticsearch-2-3-search-for-more-than-10-000-paged-items 与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…