logstash - Elasticsearch merge multiple indexes based on common field

Question

Welcome To Ask or Share your Answers For Others

logstash - Elasticsearch merge multiple indexes based on common field

posted Oct 24, 2021 in Technique[技术] by 深蓝 (71.8m points)

logstash - Elasticsearch merge multiple indexes based on common field

I'm using ELK to generate views out of the data from two different DB. One is mysql other one is PostgreSQL. There is no way of writing join query between those two DB instance. But I have a common field call "nic". Following are the documents from each index.

MySQL

index: user_detail

"_id": "871123365V",
"_source": {
    "type": "db-poc-user",
    "fname": "Iraj",
    "@version": "1",
    "field_lname": "Sanjeewa",
    "nic": "871456365V",
    "@timestamp": "2020-07-22T04:12:00.376Z",
    "id": 2,
    "lname": "Santhosh"
  }

PostgreSQL

Index: track_details

"_id": "871456365V",
"_source": {
   "@version": "1",
   "nic": "871456365V",
   "@timestamp": "2020-07-22T04:12:00.213Z",
   "track": "ELK",
   "type": "db-poc-ceg"
},

I want to merge both index in to single index using common field "nic". And create new index. So I can create visualization on Kibana. How can this be achieved?

Please note that each document in new index should have "nic,fname,lname,track" as fields. Not the aggregation.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Reply

深蓝 · Answer 1 · 2021-10-23T21:35:32+0000

I would leverage the enrich processor to achieve this.

First, you need to create an enrich policy (use the smallest index, let's say it's user_detail):

PUT /_enrich/policy/user-policy
{
  "match": {
    "indices": "user_detail",
    "match_field": "nic",
    "enrich_fields": ["fname", "lname"]
  }
}

Then you can execute that policy in order to create an enrichment index

POST /_enrich/policy/user-policy/_execute

The next step requires you to create an ingest pipeline that uses the above enrich policy/index:

PUT /_ingest/pipeline/user_lookup
{
  "description" : "Enriching user details with tracks",
  "processors" : [
    {
      "enrich" : {
        "policy_name": "user-policy",
        "field" : "nic",
        "target_field": "tmp",
        "max_matches": "1"
      }
    },
    {
      "script": {
        "if": "ctx.tmp != null",
        "source": "ctx.putAll(ctx.tmp); ctx.remove('tmp');"
      }
    },
    {
      "remove": {
        "field": ["@version", "@timestamp", "type"]
      }
    }
  ]
}

Finally, you're now ready to create your target index with the joined data. Simply leverage the _reindex API combined with the ingest pipeline we've just created:

POST _reindex
{
  "source": {
    "index": "track_details"
  },
  "dest": {
    "index": "user_tracks",
    "pipeline": "user_lookup"
  }
}

After running this, the user_tracks index will contain exactly what you need, for instance:

  {
    "_index" : "user_tracks",
    "_type" : "_doc",
    "_id" : "0uA8dXMBU9tMsBeoajlw",
    "_score" : 1.0,
    "_source" : {
      "fname" : "Iraj",
      "nic" : "871456365V",
      "lname" : "Santhosh",
      "track" : "ELK"
    }
  }

If your source indexes ever change (new users, changed names, etc), you'll need to re-run the above steps, but before doing it, you need to delete the ingest pipeline and the ingest policy (in that order):

DELETE /_ingest/pipeline/user_lookup
DELETE /_enrich/policy/user-policy

After that you can freely re-run the above steps.

PS: Just note that I cheated a bit since the record in user_detail doesn't have the same nic in your example, but I guess it was a copy/paste issue.

Categories

logstash - Elasticsearch merge multiple indexes based on common field

logstash - Elasticsearch merge multiple indexes based on common field

Please log in or register to add a comment.

Please log in or register to reply this article.

1 Reply

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags