I have an elasticsearch (7.10) cluster running that is primarily meant for powering search on text documents. The index that I'm working with does not need to be updated often, and there is no great necessity for speed during index time. Performance in this system is really needed for search time. The number of documents will likely always be in the range of 50-70 million and the store size is ~300GB once it's all built.
The mapping for the index and field I'm concerned with looks something like this:
"mappings": {
"properties": {
"document_text": {
"type": "text"
}
}
}
The document_text
is a string of text anywhere in the region of 50-500 words. The typical queries being sent to this index are match
queries chained together inside a boolean should
query. Usually, the number of clauses are in the range of 5-15.
The issue I've been running into is that the initial latency for search queries to the index is very high usually in the range of 4-6s but after the first search the document is cached so the latency becomes much lower <1s. The cluster has 3 data nodes, 3 master nodes and 2 ingest/client nodes and is backed by fast SSD. I noticed that the heap on the data nodes is never really under too much pressure nor is the RAM this led me to realize that the documents weren't cached in advance the way I wanted them to be. From what I've researched I've landed on either enabling fielddata=true
to get the field data object in memory at index time rather than constructing that at search time. I understand this will increase pressure on the JVM heap so I may do some frequency filtering to only place certain documents in memory. The other option I've come across is setting eager_global_ordinals=true
which in some ways seems similar to enabling fielddata
as it builds the mappings in-memory at index time also. I'm a bit new with ES and the terminology between the two is somewhat confusing to me. What I'd love to know is what is the difference between the two and does enable one or both of them to seem reasonable to solve the latency issues I'm having or I have completely misunderstood the docs. Thanks!
question from:
https://stackoverflow.com/questions/65875695/whats-the-difference-bettween-fielddata-enabled-vs-eager-global-ordinals-for-op 与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…