I think @karmi makes it right. However let me explain it a bit simpler. I needed to occasionally upgrade production schema with some new properties or analysis settings.
I recently started to use the scenario described below to do live, constant load, zero-downtime index migrations. You can do that remotely.
Here are steps:
Assumptions:
- You have index
real1
and aliases real_write
, real_read
pointing to it,
- the client writes only to
real_write
and reads only from real_read
,
_source
property of document is available.
1. New index
Create real2
index with new mapping and settings of your choice.
2. Writer alias switch
Using following bulk query switch write alias.
curl -XPOST 'http://esserver:9200/_aliases' -d '
{
"actions" : [
{ "remove" : { "index" : "real1", "alias" : "real_write" } },
{ "add" : { "index" : "real2", "alias" : "real_write" } }
]
}'
This is atomic operation. From this time real2
is populated with new client's data on all nodes. Readers still use old real1
via real_read
. This is eventual consistency.
3. Old data migration
Data must be migrated from real1
to real2
, however new documents in real2
can't be overwritten with old entries. Migrating script should use bulk
API with create
operation (not index
or update
). I use simple Ruby script es-reindex which has nice E.T.A. status:
$ ruby es-reindex.rb http://esserver:9200/real1 http://esserver:9200/real2
UPDATE 2017 You may consider new Reindex API instead of using the script. It has lot of interesting features like conflicts reporting etc.
4. Reader alias switch
Now real2
is up to date and clients are writing to it, however they are still reading from real1
. Let's update reader alias:
curl -XPOST 'http://esserver:9200/_aliases' -d '
{
"actions" : [
{ "remove" : { "index" : "real1", "alias" : "real_read" } },
{ "add" : { "index" : "real2", "alias" : "real_read" } }
]
}'
5. Backup and delete old index
Writes and reads go to real2
. You can backup and delete real1
index from ES cluster.
Done!
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…