There are three ways to create archives from the Graylog Elasticsearch indices.
- Web Interface
- Index Retention
- REST API
Web Interface
Manually create an archive on the “Enterprise/Archives” page from the web interface.
Navigate to the “Create Archive for Index” section of the page. There is a form there where you can select an index and archive it by clicking Archive Index. Archiving an index does not close or delete the index, so it is a great way to test the archiving feature without changing your index retention configuration.
Index Retention
Graylog Archive ships with an index retention strategy that can automatically create archives before closing or deleting Elasticsearch indices, which is the easiest way to automatically create archives without custom scripting.
To configure index retention, please see the Index Retention Configuration.
REST API
Graylog Archive also offers a REST API to automate archive creation if you have special requirements and need a more flexible approach.
Archive an index with a simple curl command:
$ curl -s -u admin -H 'X-Requested-By: cli' -X POST http://127.0.0.1:9000/api/plugins/org.graylog.plugins.archive/archives/graylog_386
Enter host password for user 'admin': ***************
{
"archive_job_config" : {
"archive_path" : "/tmp/graylog-archive",
"max_segment_size" : 524288000,
"segment_filename_prefix" : "archive-segment",
"metadata_filename" : "archive-metadata.json",
"source_histogram_bucket_size" : 86400000,
"restore_index_batch_size" : 1001,
"segment_compression_type": "SNAPPY"
},
"system_job" : {
"id" : "cd7ebfa0-079b-11e6-9e1b-fa163e6e9b8a",
"description" : "Archives indices and deletes them",
"name" : "org.graylog.plugins.archive.job.ArchiveCreateSystemJob",
"info" : "Archiving documents in index: graylog_386",
"node_id" : "c5df7bff-cafd-4546-ac0a-5ccd2ba4c847",
"started_at" : "2016-04-21T08:34:03.034Z",
"percent_complete" : 0,
"provides_progress" : true,
"is_cancelable" : true
}
}
That command started a system job in the Graylog server to create an archive for index
Use the graylog_386
.system_job.id
to check the progress of the job.
The REST API can automate other archive-related tasks, like the restoration and deletion of archives, and the process of updating the archive configuration. See the REST API browser on your Graylog server for details.
Restoring Archives
Graylog Archive offers two ways to restore archived indices.
- Web Interface
- REST API
Graylog Archive restores all indices in the “Restored Archives” index set to avoid conflicts with the original indices if they still exist.
Restored indices are also marked as
so they are ignored by index-retention jobs and are not closed or deleted. Therefore, you must manually delete restored indices when you no longer need them.reopened
,
Web Interface
To restore an archive on the “Enterprise/Archives” page through the web interface, select an archive from the list, open the archive details, and click the Restore Index button.
REST API
As with archive creation, you can also use the REST API to restore an archived index into the Elasticsearch cluster:
$ curl -s -u admin -H 'X-Requested-By: cli' -X POST http://127.0.0.1:9000/api/plugins/org.graylog.plugins.archive/archives/graylog_386/restore
Enter host password for user 'admin': ***************
{
"archive_metadata": {
"archive_id": "graylog_307",
"index_name": "graylog_307",
"document_count": 491906,
"created_at": "2016-04-14T14:31:50.787Z",
"creation_duration": 142663,
"timestamp_min": "2016-04-14T14:00:01.008Z",
"timestamp_max": "2016-04-14T14:29:27.639Z",
"id_mappings": {
"streams": {
"56fbafe0fb121a5309cef297": "nginx requests"
},
"inputs": {
"56fbafe0fb121a5309cef290": "nginx error_log",
"56fbafe0fb121a5309cef28d": "nginx access_log"
},
"nodes": {
"c5df7bff-cafd-4546-ac0a-5ccd2ba4c847": "graylog.example.org"
}
},
"histogram_bucket_size": 86400000,
"source_histogram": {
"2016-04-14T00:00:00.000Z": {
"example.org": 227567
}
},
"segments": [
{
"path": "archive-segment-0.gz",
"size": 21653755,
"raw_size": 2359745839,
"compression_type": "SNAPPY"
"checksum": "751e6e76",
"checksum_type": "CRC32"
}
],
"index_size": 12509063,
"index_shard_count": 4
},
"system_job": {
"id": "e680dcc0-07a2-11e6-9e1b-fa163e6e9b8a",
"description": "Restores an index from the archive",
"name": "org.graylog.plugins.archive.job.ArchiveRestoreSystemJob",
"info": "Restoring documents from archived index: graylog_307",
"node_id": "c5df7bff-cafd-4546-ac0a-5ccd2ba4c847",
"started_at": "2016-04-21T09:24:51.468Z",
"percent_complete": 0,
"provides_progress": true,
"is_cancelable": true
}
}
The returned JSON payload contains the archive metadata and the system job description that runs the index-restore process.
Restore into a Separate Cluster
The added load from restored indices slows down your indexing speed. To avoid adding more load to your Elasticsearch cluster, restore the archived indices on a different cluster.
To restore the archived indices on a different cluster, transfer the archived indices to a different machine, and place them in a configured Backend.
Each index archive is in a separate directory, so, if you only want to transfer one index to a different machine, copy the corresponding directory into the backend.
Example:
$ tree /tmp/graylog-archive
/tmp/graylog-archive
├── graylog_171
│ ├── archive-metadata.json
│ └── archive-segment-0.gz
├── graylog_201
│ ├── archive-metadata.json
│ └── archive-segment-0.gz
├── graylog_268
│ ├── archive-metadata.json
│ └── archive-segment-0.gz
├── graylog_293
│ ├── archive-metadata.json
│ └── archive-segment-0.gz
├── graylog_307
│ ├── archive-metadata.json
│ └── archive-segment-0.gz
├── graylog_386
│ ├── archive-metadata.json
│ └── archive-segment-0.gz
└── graylog_81
├── archive-metadata.json
└── archive-segment-0.gz
7 directories, 14 files
Searching in Restored Indices
Search queries automatically using restored indices.
Every restored message in an Elasticsearch index has a special gl2_archive_restored
field with the value true,
so you can search in restored messages by using a query like:
_exists_:gl2_archive_restored AND <your search query>
Example:
If you want to exclude all restored messages from you query, use:
_missing_:gl2_archive_restored AND <your search query>