Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
476 views
in Technique[技术] by (71.8m points)

Elasticsearch aggregation by max date giving wrong results

I want to group by sysCode by max date that is get the latest documents for each sysCode and then aggregate on employeeId and type field. The below query does not return me desired results, for GER it returns employeeId=1 and for IND it returns employeeId=3 which I do not want.

sample json document

{
  "sysCode": "GER",
  "employeeId": 1,
  "date": "2014-06-14",
  "categories": {
    "pb": [
      {
        "metric": "OVERDUE",
        "type": "LATE"
      }
    ]
  }
}
{
  "sysCode": "GER",
  "employeeId": 2,
  "date": "2014-06-15",
  "categories": {
    "pb": [
      {
        "metric": "OVERDUE",
        "type": "LATE"
      }
    ]
  }
}
{
  "sysCode": "IND",
  "employeeId": 3,
  "date": "2014-06-16",
  "categories": {
    "pb": [
      {
        "metric": "OVERDUE",
        "type": "LATE"
      }
    ]
  }
}
{
  "sysCode": "IND",
  "employeeId": 3,
  "date": "2014-06-16",
  "categories": {
    "pb": [
      {
        "metric": "OVERDUE",
        "type": "MISSED"
      }
    ]
  }
}

aggregation query

{
  "aggs": {
    "result_by_sys_code": {
      "terms": {
        "field": "sysCode"
      },
      "aggs": {
        "max_as_of_date": {
          "max": {
            "field": "date"
          }
        },
        "employees": {
          "terms": {
            "field": "employeeId"
          },
          "aggs": {
            "nested": {
              "nested": {
                "path": "categories.pb"
              },
              "aggs": {
                "metrics": {
                  "terms": {
                    "field": "categories.pb.type.keyword"
                  }
                }
              }
            }
          }
        }
      }
    }
  }
}

mappings

{
  "mappings": {
    "properties": {
      "date": {
        "type": "date"
      },
      "categories": {
        "properties": {
          "pb": {
            "type": "nested",
            "properties": {
              "metric": {
                "type": "text",
                "fields": {
                  "keyword": {
                    "type": "keyword",
                    "ignore_above": 256
                  }
                }
              },
              "type": {
                "type": "text",
                "fields": {
                  "keyword": {
                    "type": "keyword",
                    "ignore_above": 256
                  }
                }
              }
            }
          }
        }
      },
      "controlCode": {
        "type": "text",
        "fields": {
          "keyword": {
            "type": "keyword",
            "ignore_above": 256
          }
        }
      }
    }
  }
}
question from:https://stackoverflow.com/questions/66059413/elasticsearch-aggregation-by-max-date-giving-wrong-results

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

I think using top_hits, you can fulfil your requirement. From the official documentation:

This aggregator is intended to be used as a sub aggregator, so that the top matching documents can be aggregated per bucket.

curl -X POST "localhost:9200/sales/_search?size=0&pretty" -H 'Content-Type: application/json' -d'
{
  "aggs": {
    "top_tags": {
      "terms": {
        "field": "type",
        "size": 3
      },
      "aggs": {
        "top_sales_hits": {
          "top_hits": {
            "sort": [
              {
                "date": {
                  "order": "desc"
                }
              }
            ],
            "_source": {
              "includes": [ "date", "price" ]
            },
            "size": 1
          }
        }
      }
    }
  }
}
'

we group the sales by type and per type we show the last sale. For each sale only the date and price fields are being included in the source.

Also, Sampler Aggregations do something similar but in a different way.

A filtering aggregation used to limit any sub aggregations' processing to a sample of the top-scoring documents.

Can refer Limit ElasticSearch aggregation to top n query results for a TL;DR on Sampler.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...