Usage
  • 01 Jul 2022
  • 4 Minutes to read
  • Dark
    Light

Usage

  • Dark
    Light

Creating Archives

There are three ways to create archives from the Graylog Elasticsearch indices.

  • Web Interface
  • Index Retention
  • REST API

Web Interface

You can manually create an archive on the “Operations/Archives” page in theweb interface.

archiving-usage-create-web

On the “Create Archive for Index” section of the page is a form where you can select an index and archive it by pressing “Archive Index”.

Using this will just archive the index to disk and does not close it ordelete it. This is a great way to test the archiving feature withoutchanging your index retention configuration.

Index Retention

Graylog Archive ships with an index retention strategy that can be used to automatically create archives before closing or deleting Elasticsearch indices.

This is the easiest way to automatically create archives without customscripting.

Please see the Index Retention Configuration on how to configure it.

REST API

Graylog Archive also offers a REST API that you can use to automate archive creation if you have some special requirements and need a more flexible way todo this.

archiving-usage-create-api

An index can be archived with a simple curl command:

$ curl -s -u admin -H 'X-Requested-By: cli' -X POST http://127.0.0.1:9000/api/plugins/org.graylog.plugins.archive/archives/graylog_386
Enter host password for user 'admin': ***************
{
   "archive_job_config" : {
     "archive_path" : "/tmp/graylog-archive",
     "max_segment_size" : 524288000,
     "segment_filename_prefix" : "archive-segment",
     "metadata_filename" : "archive-metadata.json",
     "source_histogram_bucket_size" : 86400000,
     "restore_index_batch_size" : 1001,
     "segment_compression_type": "SNAPPY"
   },
   "system_job" : {
     "id" : "cd7ebfa0-079b-11e6-9e1b-fa163e6e9b8a",
     "description" : "Archives indices and deletes them",
     "name" : "org.graylog.plugins.archive.job.ArchiveCreateSystemJob",
     "info" : "Archiving documents in index: graylog_386",
     "node_id" : "c5df7bff-cafd-4546-ac0a-5ccd2ba4c847",
     "started_at" : "2016-04-21T08:34:03.034Z",
     "percent_complete" : 0,
     "provides_progress" : true,
     "is_cancelable" : true
   }
 }

That command started a system job in the Graylog server to create an archivefor index graylog_386. The system_job.idcan be used to check theprogress of the job.

The REST API can be used to automate other archive related tasks as well, likerestoring and deleting archives or updating the archive config. See theREST API browser on your Graylog server for details.

Restoring Archives

Note

The restore process adds load to your Elasticsearch cluster becauseall messages are basically re-indexed. Please make sure to keepthis in mind and test with smaller archives to see how your clusterbehaves. Also use the Restore Index Batch Size setting to control the Elasticsearch batch size on re-index.

Graylog Archive offers two ways to restore archived indices.

  • Web Interface
  • REST API

Graylog Archive restores all indices into the “Restored Archives” index setto avoid conflicts with the original indices. (should those still exist)

archiving-usage-restore-web-result

Restored indices are also marked as reopenedso they are ignored byindex retention jobs and are not closed or deleted. That means you have tomanually delete any restored indices manually once you do not need them anymore.

Web Interface

In the web interface you can restore an archive on the “Operations/Archives” pageby selecting an archive from the list, open the archive details and clickingthe “Restore Index” button.

archiving-usage-restore-web

REST API

As with archive creation you can also use the REST API to restore anarchived index into the Elasticsearch cluster:

$ curl -s -u admin -H 'X-Requested-By: cli' -X POST http://127.0.0.1:9000/api/plugins/org.graylog.plugins.archive/archives/graylog_386/restore
Enter host password for user 'admin': ***************
{
   "archive_metadata": {
     "archive_id": "graylog_307",
     "index_name": "graylog_307",
     "document_count": 491906,
     "created_at": "2016-04-14T14:31:50.787Z",
     "creation_duration": 142663,
     "timestamp_min": "2016-04-14T14:00:01.008Z",
     "timestamp_max": "2016-04-14T14:29:27.639Z",
     "id_mappings": {
       "streams": {
         "56fbafe0fb121a5309cef297": "nginx requests"
       },
       "inputs": {
         "56fbafe0fb121a5309cef290": "nginx error_log",
         "56fbafe0fb121a5309cef28d": "nginx access_log"
       },
       "nodes": {
         "c5df7bff-cafd-4546-ac0a-5ccd2ba4c847": "graylog.example.org"
       }
     },
     "histogram_bucket_size": 86400000,
     "source_histogram": {
       "2016-04-14T00:00:00.000Z": {
         "example.org": 227567
       }
     },
     "segments": [
       {
         "path": "archive-segment-0.gz",
         "size": 21653755,
         "raw_size": 2359745839,
         "compression_type": "SNAPPY"
         "checksum": "751e6e76",
         "checksum_type": "CRC32"
       }
     ],
     "index_size": 12509063,
     "index_shard_count": 4
   },
   "system_job": {
     "id": "e680dcc0-07a2-11e6-9e1b-fa163e6e9b8a",
     "description": "Restores an index from the archive",
     "name": "org.graylog.plugins.archive.job.ArchiveRestoreSystemJob",
     "info": "Restoring documents from archived index: graylog_307",
     "node_id": "c5df7bff-cafd-4546-ac0a-5ccd2ba4c847",
     "started_at": "2016-04-21T09:24:51.468Z",
     "percent_complete": 0,
     "provides_progress": true,
     "is_cancelable": true
   }
 }

The returned JSON payload contains the archive metadata and the system jobdescription that runs the index restore process.

Restore into a separate cluster

As said earlier, restoring archived indices slow down your indexing speedbecause of added load. If you want to completely avoid adding more load toyour Elasticsearch cluster, you can restore the archived indices on adifferent cluster.

To do that, you only have to transfer the archived indices to a differentmachine and put them into a configured Backend.

Each index archive is in a separate directory, so if you only want to transferone index to a different machine, you only have to copy the correspondingdirectory into the backend.

Example:

$ tree /tmp/graylog-archive
  /tmp/graylog-archive
  ├── graylog_171
  │   ├── archive-metadata.json
  │   └── archive-segment-0.gz
  ├── graylog_201
  │   ├── archive-metadata.json
  │   └── archive-segment-0.gz
  ├── graylog_268
  │   ├── archive-metadata.json
  │   └── archive-segment-0.gz
  ├── graylog_293
  │   ├── archive-metadata.json
  │   └── archive-segment-0.gz
  ├── graylog_307
  │   ├── archive-metadata.json
  │   └── archive-segment-0.gz
  ├── graylog_386
  │   ├── archive-metadata.json
  │   └── archive-segment-0.gz
  └── graylog_81
      ├── archive-metadata.json
      └── archive-segment-0.gz
  7 directories, 14 files

Searching in Restored Indices

Once an index has been restored from an archive it will be used by searchqueries automatically.

Every message that gets restored into an Elasticsearch index gets a specialgl2_archive_restoredfield with value true. This allows you to onlysearch in restored messages by using a query like:

_exists_:gl2_archive_restored AND <your search query>

Example:

archiving-usage-search

If you want to exclude all restored messages from you query you can use:

_missing_:gl2_archive_restored AND <your search query>

Was this article helpful?

What's Next