This page contains a list of commands which can be used to assist you in reindexing indices on new versions of Elasticsearch.

Warning: The samples listed in this tutorial are not intended to be a "copy and paste" list of commands but sample commands intended to instruct. You will need to read and adjust the commands to your local specifications. We use the tools httpie and jq in the following commands.

Prerequisites

Check ES Versions of All Nodes

The ES version needs to be the same on all ES nodes in the cluster before we can start the reindex process:

Copy
http ":9200/_cat/nodes?v&h=name,ip,version"

Check That All Shards Are Initialized (“green”)

All shards need to be initialized before we can start the reindex process:

Copy
http ":9200/_cat/indices?h=health,status,index" | grep -v '^green'

Update Graylog Index Templates in Elasticsearch

The index templates that Graylog writes to Elasticsearch need to be updated before we can start the reindex process:

Copy
http post :9000/api/system/indexer/indices/templates/update x-requested-by:httpie

Collect Indices That Need a Reindex to Work with ES 6

All indices which have not been created with ES 5 need to be reindex to work with ES 6 (or deleted if they are not needed anymore):

Copy
http :9200/_settings | jq '[ path(.[] | select(.settings.index.version.created < "5000000"))[] ]'

Reindexing Procedure

These steps describe how to reindex an index on Elasticsearch. The detailed reference commands which can be used (after adjusting them to your specific details) are linked after each step.

  1. Check that the index is not the active write index (Cmd).
  2. Create a reindex target index: <index-name>_reindex (e.g. graylog_0_reindex) (with correct settings for shards and replicas) (Cmd).
  3. Check that mapping and settings of the new index are correct (Cmd).
  4. Start reindex task in ES (using requests_per_second URL parameter and size parameter in the payload to avoid overloading the ES cluster) (Cmd).
  5. Check progress of reindex task and wait until it is done (Cmd).
  6. Check that the document counts in the old and the new index match (Cmd).
  7. Delete old index (Cmd).
  8. Recreate the old index: <index-name> (e.g. graylog_0) (with correct settings for shards and replicas) (Cmd).
  9. Check that mapping and settings of the new index are correct (Cmd).
  10. Start reindex task in ES to reindex documents back into the old index (using requests_per_second URL parameter and size parameter in the payload to avoid overloading the ES cluster) (Cmd).
  11. Check that the document counts in the old and the new index match (Cmd).
  12. Recreate Graylog index ranges for the old index (Cmd).
  13. Delete temporary reindex target index (e.g. graylog_0_reindex (Cmd).

Reindex Commands for Every Index

The following commands need to be executed for every index that needs to be reindexed. Replace the graylog_0 index name in the examples below with the index name you are currently working on.

Check if Index is an Active Write Index

You should never reindex the active write target because that index is actively written to. If the active write index is still a 2.x ES index, a manual index rotation needs to be triggered:

Copy
http ':9200/*_deflector/_alias' | jq 'keys'

Create New Index

The new index needs to be created before it can be used as a reindex target. The request needs to include the correct settings for the number of shards and replicas. These settings can be different for each index set! (actual settings can be found in the Graylog UI on the “System / Indices” page for each index set):

Copy
http put :9200/graylog_0_reindex settings:='{"number_of_shards":4,"number_of_replicas":0}'

Check Mapping and Index Settings

Use these commands to check if the settings and index mapping for the new index are correct:

Copy
http :9200/graylog_0_reindex/_mapping

http :9200/graylog_0_reindex/_settings

Start Reindex Process

This command starts the actual reindex process. It will return a task ID that can be used to check the progress of the reindex task in Elasticsearch.

The size value in the payload is the batch size that will be used for the reindex process. It defaults to 1000 and can be adjusted to tune the reindexing process:

Copy
http post :9200/_reindex wait_for_completion==false source:='{"index":"graylog_0","size": 1000}' dest:='{"index":"graylog_0_reindex"}'

The reindex API supports the requests_per_second URL parameter to throttle the reindex process. This can be useful to make sure that the reindex process doesn’t take too many resources. We recommend this guide for an explanation on how the parameter works:

Copy
http post :9200/_reindex wait_for_completion==false requests_per_second==500 source:='{"index":"graylog_0","size": 1000}' dest:='{"index":"graylog_0_reindex"}'

Wait for the Reindex to Complete and Check Reindex Progress

The reindex progress can be checked with the following command using the task ID that has been returned by the reindex request:

Copy
http :9200/_tasks/

Compare Documents in the Old and New Index

Before we continue, we should check that all documents have been reindexed into the new index by comparing the document counts:

Copy
http :9200/graylog_0/_count

http :9200/graylog_0_reindex/_count

Delete Old Index

Now delete the old index so we can recreate it for reindexing:

Copy
http delete :9200/graylog_0

Recreate Old Index

Recreate the old index again so we can use it as a reindex target. The request needs to include the correct settings for the number of shards and replicas. These settings can be different for each index set! (actual settings can be found in the Graylog UI on the “System / Indices” page for each index set):

Copy
http put :9200/graylog_0 settings:='{"number_of_shards":4,"number_of_replicas":0}'

Check Mapping and Index Settings

Use these commands to check if the settings and index mapping for the recreated index are correct:

Copy
http :9200/graylog_0/_mapping

http :9200/graylog_0/_settings

Start reindex Process for Old Index

This command starts the reindex process to move back the documents into the old index. It will return a task ID that can be used to check the progress of the reindex task in Elasticsearch.

The size value in the payload is the batch size that will be used for the reindex process. It defaults to 1000 and can be adjusted to tune the reindexing process:

Copy
http post :9200/_reindex wait_for_completion==false source:='{"index":"graylog_0_reindex","size": 1000}' dest:='{"index":"graylog_0"}'

The reindex API supports the requests_per_second URL parameter to throttle the reindex process. This can be useful to make sure that the reindex process doesn’t take too much resources. We recommend this guide for an explanation on how the parameter works:

Copy
http post :9200/_reindex wait_for_completion==false requests_per_second==500 source:='{"index":"graylog_0_reindex","size": 1000}' dest:='{"index":"graylog_0"}'

Compare Documents in the Old and New Index

Before we continue, we should check that all documents have been reindexed into the re-created old index by comparing the document counts with the temporary index:

Copy
http :9200/graylog_0/_count

http :9200/graylog_0_reindex/_count

Create Index Range for the Recreated Index

Graylog needs to know about the recreated index by creating an index range for it:

Copy
http post :9000/api/system/indices/ranges/graylog_0/rebuild x-requested-by:httpie

Delete Temporary Reindex Target Index

The temporary reindex target index can now be deleted:

Copy
http delete :9200/graylog_0_reindex

Cleanup

The reindex process leaves some tasks in Elasticsearch that need to be cleaned up manually.

Find Completed Reindex Tasks for Deletion

Execute the following command to get all the tasks we should remove:

Copy
http :9200/.tasks/_search | jq '[.hits.hits[] | select(._source.task.action == "indices:data/write/reindex" and ._source.completed == true) | {"task_id": ._id, "description": ._source.task.description}]'

Remove Completed Reindex Tasks

Execute the following command for every completed task ID:

Copy
http delete :9200/.tasks/task/PUT_YOUR_TASK_ID_HERE