Elasticsearch

We strongly recommend that you use a dedicated Elasticsearch cluster for your Graylog setup.

If you are using a shared Elasticsearch setup, a problem with indices unrelated to Graylog might turn the cluster status to YELLOW or RED and impact the availability and performance of your Graylog setup.

Elasticsearch Versions

Starting with version 2.3, Graylog uses the HTTP protocol to connect to your Elasticsearch cluster, so it does not have a hard requirement for the Elasticsearch version anymore. We can safely assume that any version from 2.x onwards works.

Warning: We caution you not to install or upgrade to Elasticsearch 7.11 and later! It is not supported. Doing so will break your instance!

Graylog Version	Elasticsearch Version
1.2.0-1.2.1	1.7.1
1.3.0-1.3.3	1.7.3
1.3.4	1.7.5
2.0.0	2.3.1
2.0.1-2.0.3	2.3.2
2.1.0-2.1.3	2.3.5
2.2.0-2.2.3	2.4.4
2.3.x-2.4.x	2.4.x, 5.6.x
2.5.x	2.4.x, 5.6.x, 6.8.x
3.0-3.3	5.6.x, 6.8.x
4.x	6.8.x, 7.7.x-7.10.x

Hint: Graylog works fine with the Amazon Elasticsearch Service or Elastic Cloud when using Elasticsearch 6 or 7.

Configuration

Warning: As Graylog has switched from an embedded Elasticsearch node client to a lightweight HTTP client in version 2.3, please check the upgrade notes on how to migrate your configuration if you are switching from an earlier version.

Graylog

The most important setting to make a successful connection is a list of comma-separated URIs to one or more Elasticsearch nodes. Graylog needs to know the address of at least one other Elasticsearch node given in the elasticsearch_hosts setting. The specified value should at least contain the scheme (http:// for unencrypted, https:// for encrypted connections), the hostname or IP and the port of the HTTP listener of this node (which is 9200 unless otherwise configured). Optionally, you can also specify an authentication section containing a user name and a password, if either of your Elasticsearch nodes use Shield/X-Pack or Search Guard, or you have an intermediate HTTP proxy requiring authentication between the Graylog server and the Elasticsearch node. Additionally you can specify an optional path prefix at the end of the URI.

A sample specification of elasticsearch_hosts:

Copy

elasticsearch_hosts = http://es-node-1.example.org:9200/foo,https://someuser:somepassword@es-node-2.example.org:19200

Warning: Graylog assumes that all nodes in the cluster are running the same versions of Elasticsearch. While it still might work when patch-levels differ, we highly encourage you to keep versions consistent.

Graylog does not currently react to externally triggered index changes (creating/closing/reopening/deleting an index). These actions need to be performed through the Graylog REST API in order to retain index consistency.

Available Elasticsearch Configuration Tunables

The following configuration options are now being used to configure connectivity to Elasticsearch:

Config Setting	Type	Comments	Default
`elasticsearch_connect_timeout`	Duration	Timeout when connection to individual Elasticsearch hosts	`10s` (10 Seconds)
`elasticsearch_hosts`	List<URI>	Comma-separated list of URIs of Elasticsearch hosts	`http://127.0.0.1:9200`
`elasticsearch_idle_timeout`	Duration	Timeout after which idle connections are terminated	`-1s` (Never)
`elasticsearch_max_total_connections`	int	Maximum number of total Elasticsearch connections	`20`
`elasticsearch_max_total_connections_per_route`	int	Maximum number of Elasticsearch connections per route/host	`2`
`elasticsearch_socket_timeout`	Duration	Timeout when sending/receiving from Elasticsearch connection	`60s` (60 Seconds)
`elasticsearch_discovery_enabled`	boolean	Enable automatic Elasticsearch node discovery	`false`
`elasticsearch_discovery_default_user`	String	The default username used for authentication for all newly discovered nodes.	empty (no authentication used for discovered nodes)
`elasticsearch_discovery_default_password`	String	The default password used for authentication for all newly discovered nodes.	empty (no authenticationused for discovered nodes)
`elasticsearch_discovery_default_scheme`	String	The default scheme used for all newly discovered nodes.	`http`
`elasticsearch_discovery_filter`	String	Filter by node attributes for the discovered nodes	empty (use all nodes)
`elasticsearch_discovery_frequency`	Duration	Frequency of the Elasticsearch node discovery	`30s` (30 Seconds)
`elasticsearch_compression_enabled`	boolean	Enable GZIP compression of Elasticseach request payloads	`false`
`elasticsearch_version`	String	Major version of the Elasticsearch version used. If not specified, the version will be auto-sensed with the configured nodes. Will disable auto-sensing if specified.	`<not set>` (auto-sense) Values: `6` / `7`
`elasticsearch_mute_deprecation_warnings`	boolean	Enable muting of deprecation warnings for deprecated configuration settings in Elasticsearch. These warnings are attached as “Warnings” in HTTP-Response headers and might clutter up the logs. Works only with ES7.	false
`elasticsearch_version_probe_attempts`	int	Maximum number of retries to connect to Elasticsearch on boot for the version probe before finally giving up. Use 0 to try until a connection can be made.	`0` (defaults to try toconnect until a connectioncould be made)
`elasticsearch_version_probe_delay`	Duration	Waiting time in between connection attempts for elasticsearch_version_probe_attempts	`5s` (defaults to waitfor 5 seconds between retries)

Automatic Version Sensing

We support multiple major versions of Elasticsearch (starting with Graylog 4.0) which are partially incompatible with each other (ES6 & ES7). Therefore, we need to know which Elasticsearch version is running in the cluster. This is why we make a single request to the first reachable Elasticsearch node and parse the version of the response it sends back. There are a few things which could go wrong at this point. You might want to run an unsupported version. If you feel comfortable doing so, you can set the elasticsearch_version configuration variable. It will disable auto-sensing and force Graylog to pretend that this Elasticsearch major version is running in the cluster. It will load the corresponding support module.

Hint: Elasticsearch 8.0 is not supported by Graylog 4.0. There is a good chance that it will work with our ES7 support, so you can try to set elasticsearch_version = 7 to make it run.

Automatic Node Discovery

Warning: Automatic node discovery does not work when using Amazon Elasticsearch Service because Amazon blocks certain Elasticsearch API endpoints.

Graylog uses automatic node discovery to gather a list of all available Elasticsearch nodes in the cluster at runtime and distributes requests among them to potentially increase their performance and availability. To enable this feature, you need to set the elasticsearch_discovery_enabled to true. Optionally, you can define a filter allowing to selectively include/exclude discovered nodes (details on how to specify node filters are found in the Elasticsearch cluster documentation) using the elasticsearch_discovery_filter setting, or by tuning the frequency of the node discovery using the elasticsearch_discovery_frequency configuration option. If your Elasticsearch cluster uses authentication, you need to specify the elasticsearch_discovery_default_user and elasticsearch_discovery_default_password settings. The username/password specified in these settings will be used for all nodes discovered in the cluster. If your cluster uses HTTPS, you also need to set the elasticsearch_discovery_default_scheme setting. It specifies the scheme used for discovered nodes and must be consistent across all nodes in the cluster.

Configuration of Elasticsearch Nodes

Control Access to Elasticsearch Ports

If you are not using Shield/X-Pack or Search Guard to authenticate access to your Elasticsearch nodes, make sure to restrict access to the Elasticsearch ports (default: 9200/tcp and 9300/tcp). Otherwise the data is readable by anyone who has access to the machine over a network.

Open File Limits

Because Elasticsearch has to keep a lot of files open simultaneously it requires a higher open file limit than most operating system defaults allow. Set it to at least 64000 open file descriptors.

Graylog will show a notification in the web interface when there is a node in the Elasticsearch cluster which has an open file limit that is too low.

Read about how to raise the open file limit in the corresponding 5.x, 6.x, and 7.x documentation pages.

Heap Size

We strongly recommended that you raise the standard size of heap memory allocated to Elasticsearch. For example, set the ES_HEAP_SIZE environment variable to 24g to allocate 24GB. We also recommend using around 50% of the available system memory for Elasticsearch (when running on a dedicated host) to leave enough space for the system caches that Elasticsearch uses to a great extent. But please take care that you don’t exceed 32 GB!

Merge Throttling

Warning: As of ES 6.2 Merge Throttling settings have been deprecated. See: (https://www.elastic.co/guide/en/elasticsearch/reference/6.2/breaking_60_settings_changes.html )

Elasticsearch throttles the merging of Lucene segments to allow extremely fast searches. This throttling however has default values that are very conservative and can lead to slow ingestion rates when used with Graylog. You can see the message journal growing without any real indication of CPU or memory stress on the Elasticsearch nodes. It usually shows up in Elasticsearch INFO log messages like this:

Copy

now throttling indexing

When running on fast IO like SSDs or a SAN we recommend increasing the value of the indices.store.throttle.max_bytes_per_sec in your elasticsearch.yml to 150MB:

Copy

indices.store.throttle.max_bytes_per_sec: 150mb

Play around with this setting until you reach the best performance.

Tuning Elasticsearch

Graylog sets specific configurations for every index it manages. This tuning is sufficient for a lot of use cases and setups.

More detailed information about Elasticsearch configurations can be found in the official documentation.

Avoiding Split-Brain and Shard Shuffling

Split-Brain Events

Elasticsearch sacrifices consistency in order to ensure availability and partition tolerance. The reasoning behind this is that short periods of misbehavior are less problematic than short periods of unavailability. In other words, when Elasticsearch nodes within a cluster are unable to replicate changes to data, they will keep serving applications such as Graylog. When the nodes are able to replicate their data, they will attempt to converge the replicas and achieve eventual consistency .

Elasticsearch tackles the previous by electing leader nodes, which are in charge of database operations such as creating new indices, moving shards around the cluster nodes and so forth. Leader nodes coordinate their actions actively with others, ensuring that the data can be converged by non-leaders. The cluster nodes that are not leader nodes are not allowed to make changes that would break the cluster.

The previous mechanism can in some circumstances fail, causing a split-brain event. When an Elasticsearch cluster is split into two sections which work on the data independently, data consistency is lost. As a result nodes will respond differently to the same queries. This is considered a catastrophic event because the data originating from the two leaders can not be rejoined automatically and it takes quite a bit of manual work to remedy the situation.

Avoiding Split-Brain Events

Elasticsearch nodes take a simple majority vote over who is leader. If the majority agrees on one, then most likely the disconnected minority will give in and everything will be just fine. This mechanism requires that at least 3 nodes work together, merely one or two nodes can not form a majority.

The minimum amount of leader nodes required to elect a leader must be configured manually in elasticsearch.yml:

Hint: The setting discovery.zen.minimum_master_nodes is ignored if running Elasticsearch versions 7 and up. See the blog on Breaking changes in 7.0 for more details.

Copy

# At least NODES/2+1 on clusters with NODES > 2, where NODES is the number of master nodes in the cluster
discovery.zen.minimum_master_nodes: 2

An example of what configuration values should typically be:

Leader Nodes	`minimum_master_nodes`	Comments
1	1
2	1	With 2 the other nodes going down, this would stop the cluster from working!
3	2
4	3
5	3
6	4

Some of the leader nodes may be dedicated leader nodes, meaning that they are only configured to handle lightweight operational (cluster management) responsibilities. They will not be able to handle or store any of the cluster’s data. The function of such nodes is similar to so called witness servers on other database products. Setting them up on dedicated witness sites will greatly reduce the risk of Elasticsearch cluster instability.

A dedicated leader node has the following configuration in elasticsearch.yml:

Copy

node.data: false
node.master: true

Shard Shuffling

When the cluster status changes because of a node restart or availability issues, Elasticsearch will start automatically rebalancing the data in the cluster. The cluster works on making sure that the amount of shards and replicas will conform to the cluster configuration. This is a problem if status changes are just temporary. Moving shards and replicas around in the cluster takes up a considerable amount of resources and should be done only when necessary.

Avoiding Unnecessary Shuffling

Elasticsearch has a couple of configuration options which are designed to allow short times of unavailability before starting the recovery process with shard shuffling. There are 3 settings that may be configured in elasticsearch.yml:

Copy

# Recover only after the given number of nodes have joined the cluster. Can be seen as "minimum number of nodes to attempt recovery at all".
gateway.recover_after_nodes: 8
# Time to wait for additional nodes after recover_after_nodes is met.
gateway.recover_after_time: 5m
# Inform ElasticSearch about how many nodes form a full cluster. If this number is met, start up immediately.
gateway.expected_nodes: 10

The configuration options should be set up so that only minimal node unavailability is tolerated. For example server restarts are common and should be managed. The logic is that if you lose large parts of your cluster, you should not tolerate the situation and you probably should start re-shuffling the shards and replicas.

Custom Index Mappings

Sometimes it’s better not to rely on Elasticsearch’s dynamic mapping. It's better to define a stricter schema for messages.

Hint: If the index mapping is conflicting with the actual message to be sent to Elasticsearch, that message will fail to be indexed.

Graylog itself uses a default mapping which includes settings for the timestamp, message, full_message, and source fields of indexed messages:

Copy

$ curl -X GET 'http://localhost:9200/_template/graylog-internal?pretty'
{
"graylog-internal" : {
  "order" : -1,
  "index_patterns" : [
    "graylog_*"
  ],
  "settings" : {
    "index" : {
      "analysis" : {
        "analyzer" : {
          "analyzer_keyword" : {
            "filter" : "lowercase",
            "tokenizer" : "keyword"
          }
        }
      }
    }
  },
  "mappings" : {
    "message" : {
      "_source" : {
        "enabled" : true
      },
      "dynamic_templates" : [
        {
          "internal_fields" : {
            "mapping" : {
              "type" : "keyword"
            },
            "match_mapping_type" : "string",
            "match" : "gl2_*"
          }
        },
        {
          "store_generic" : {
            "mapping" : {
              "type" : "keyword"
            },
            "match_mapping_type" : "string"
          }
        }
      ],
      "properties" : {
        "gl2_processing_timestamp" : {
          "format" : "yyyy-MM-dd HH:mm:ss.SSS",
          "type" : "date"
        },
        "gl2_accounted_message_size" : {
          "type" : "long"
        },
        "gl2_receive_timestamp" : {
          "format" : "yyyy-MM-dd HH:mm:ss.SSS",
          "type" : "date"
        },
        "full_message" : {
          "fielddata" : false,
          "analyzer" : "standard",
          "type" : "text"
        },
        "streams" : {
          "type" : "keyword"
        },
        "source" : {
          "fielddata" : true,
          "analyzer" : "analyzer_keyword",
          "type" : "text"
        },
        "message" : {
          "fielddata" : false,
          "analyzer" : "standard",
          "type" : "text"
        },
        "timestamp" : {
          "format" : "yyyy-MM-dd HH:mm:ss.SSS",
          "type" : "date"
        }
      }
    }
  },
  "aliases" : { }
}

In order to extend the default mapping of Elasticsearch and Graylog, you can create one or more custom index mappings and add them as index templates to Elasticsearch.

Let’s say we have a schema for our data like the following:

Field Name	Field Type	Example
`http_method`	keyword	GET
`http_response_code`	long	200
`ingest_time`	date	2016-06-13T15:00:51.927Z
`took_ms`	long	56

This would translate to the following additional index mapping in Elasticsearch:

Copy

"mappings" : {
  "message" : {
    "properties" : {
      "http_method" : {
        "type" : "keyword"
      },
      "http_response_code" : {
        "type" : "long"
      },
      "ingest_time" : {
        "type" : "date",
        "format": "strict_date_time"
      },
      "took_ms" : {
        "type" : "long"
      }
    }
  }
}

Formatting the ingest_time field is described in Elasticsearch documentation under format mapping parameter. Also make sure to check Elasticsearch documentation for information on Field datatypes.

When Graylog creates a new index in Elasticsearch, it has to be added to an index template in order to apply additional index mapping. The Graylog default template (graylog-internal) has the lowest priority and Elasticsearch will merge it with the custom index template.

Warning: If default index mapping and custom index mapping cannot be merged (e. g. because of conflicting field datatypes), Elasticsearch will throw an exception and won’t create the index. So be extremely cautious and conservative about the custom index mappings!

Creating a New Index Template

Save the following index template for the custom index mapping into a file named graylog-custom-mapping.json:

Copy

{
  "template": "graylog_*",
  "mappings" : {
    "message" : {
      "properties" : {
        "http_method" : {
          "type" : "keyword"
        },
        "http_response_code" : {
          "type" : "long"
        },
        "ingest_time" : {
          "type" : "date",
          "format": "strict_date_time"
        },
        "took_ms" : {
          "type" : "long"
        }
      }
    }
  }
}

Hint: The above template is only compatible with Elasticsearch 6.X. If using Graylog 4.0 with Elasticsearch 7.x, use the template below and save it as graylog-custom-mapping-7x.json.

Copy

{
  "template": "graylog_*",
  "mappings": {
    "properties": {
      "http_method": {
        "type": "keyword"
      },
      "http_response_code": {
        "type": "long"
      },
      "ingest_time": {
        "type": "date",
        "format": "strict_date_time"
      },
      "took_ms": {
        "type": "long"
      }
    }
  }
}

Finally, load the index mapping into Elasticsearch with the following command:

Copy

$ curl -X PUT -d @'graylog-custom-mapping.json' -H 'Content-Type: application/json' 'http://localhost:9200/_template/graylog-custom-mapping?pretty'
{
  "acknowledged" : true
}

Every Elasticsearch index created thereon, will have an index mapping consisting of the original graylog-internal index template and the new graylog-custom-mapping template:

Copy

$ curl -X GET 'http://localhost:9200/graylog_deflector/_mapping?pretty'
{
  "graylog_3" : {
    "mappings" : {
      "message" : {
        "dynamic_templates" : [
          {
            "internal_fields" : {
              "match" : "gl2_*",
              "match_mapping_type" : "string",
              "mapping" : {
                "type" : "keyword"
              }
            }
          },
          {
            "store_generic" : {
              "match_mapping_type" : "string",
              "mapping" : {
                "type" : "keyword"
              }
            }
          }
        ],

        "properties" : {
          "full_message" : {
            "type" : "text",
            "analyzer" : "standard"
          },
          "http_method" : {
            "type" : "keyword"
          },
          "http_response_code" : {
            "type" : "long"
          },
          "ingest_time" : {
            "type" : "date",
            "format" : "strict_date_time"
          },
          "message" : {
            "type" : "text",
            "analyzer" : "standard"
          },
          "source" : {
            "type" : "text",
            "analyzer" : "analyzer_keyword",
            "fielddata" : true
          },
          "streams" : {
            "type" : "keyword"
          },
          "timestamp" : {
            "type" : "date",
            "format" : "yyyy-MM-dd HH:mm:ss.SSS"
          },
          "took_ms" : {
            "type" : "long"
          }
        }
      }
    }
  }
}

Hint: When using different index sets each can have its own mapping.

Deleting Custom Index Templates

If you want to remove an existing index template from Elasticsearch, simply issue a DELETE request to Elasticsearch:

Copy

$ curl -X DELETE 'http://localhost:9200/_template/graylog-custom-mapping?pretty'
{
  "acknowledged" : true
}

After you’ve removed the index template, new indices will only have the original index mapping:

Copy

$ curl -X GET 'http://localhost:9200/graylog_deflector/_mapping?pretty'
{
  "graylog_3" : {
    "mappings" : {
      "message" : {
        "dynamic_templates" : [
          {
            "internal_fields" : {
              "match" : "gl2_*",
              "match_mapping_type" : "string",
              "mapping" : {
                "type" : "keyword"
              }
            }
          },
          {
            "store_generic" : {
              "match_mapping_type" : "string",
              "mapping" : {
                "type" : "keyword"
              }
            }
          }
        ],

        "properties" : {
          "full_message" : {
            "type" : "text",
            "analyzer" : "standard"
          },
          "message" : {
            "type" : "text",
            "analyzer" : "standard"
          },
          "source" : {
            "type" : "text",
            "analyzer" : "analyzer_keyword",
            "fielddata" : true
          },
          "streams" : {
            "type" : "keyword"
          },
          "timestamp" : {
            "type" : "date",
            "format" : "yyyy-MM-dd HH:mm:ss.SSS"
          }
        }
      }
    }
  }
}

Additional information on Elasticsearch Index Templates can be found in the official Elasticsearch Template Documentation.

Hint: Settings and index mappings in templates are only applied to new indices. After adding, modifying, or deleting an index template, you have to manually rotate the write-active indices of your index sets for the changes to take effect.

Rotate Indices Manually

Select the desired index set on the System / Indices page in the Graylog web interface by clicking on the name of the index set, then select “Rotate active write index” from the “Maintenance” drop-down menu.

Cluster Status Explained

Elasticsearch provides a classification for cluster health.

The cluster status applies to different levels:

Shard level - see status descriptions below
Index level - inherits the status of the worst shard status
Cluster level - inherits the status of the worst index status

That means that the Elasticsearch cluster status will turn red if a single index or shard has problems even though the rest of the indices/shards are okay.

Hint: Graylog checks the status of the current write index while indexing messages. If it is GREEN or YELLOW, Graylog will continue to write messages into Elasticsearch regardless of the overall cluster status.

Explanation of different status levels:

Red

The RED status indicates that some or all of the primary shards are not available.

In this state, no searches can be performed until all primary shards have been restored.

Yellow

The YELLOW status means that all of the primary shards are available but some or all shard replicas are not.

When the index configuration includes replications with a count that is equal or higher than the number of nodes, your cluster cannot become green. In most cases, this can be solved by adding another Elasticsearch node to the cluster or by reducing the replication factor of the indices.

Green

The cluster is fully operational. All primary and replica shards are available.