Elasticsearch
  • 15 Jun 2022
  • 16 Minutes to read
  • Dark
    Light

Elasticsearch

  • Dark
    Light

We strongly recommend that you use a dedicated Elasticsearch cluster for your Graylog setup.

If you are using a shared Elasticsearch setup, a problem with indices unrelated to Graylog might turn the cluster status to YELLOW or RED and impact the availability and performance of your Graylog setup.

Elasticsearch versions

Starting with version 2.3, Graylog uses the HTTP protocol to connect to your Elasticsearch cluster, so it does not have a hard requirement for the Elasticsearch version anymore. We can safely assume that any version from 2.x onwards works.

Warning

We caution you not to install or upgrade to Elasticsearch 7.11 and later! It is not supported. Doing so will break your instance!

Graylog version

Elasticsearch version

1.2.0-1.2.1

1.7.1

1.3.0-1.3.3

1.7.3

1.3.4

1.7.5

2.0.0

2.3.1

2.0.1-2.0.3

2.3.2

2.1.0-2.1.3

2.3.5

2.2.0-2.2.3

2.4.4

2.3.x-2.4.x

2.4.x, 5.6.x

2.5.x

2.4.x, 5.6.x, 6.8.x

3.0-3.3

5.6.x, 6.8.x

4.x

6.8.x, 7.7.x-7.10.x

Note

Graylog works fine with the Amazon Elasticsearch Service using Elasticsearch 6 or 7.

Note

Graylog works fine with the Elastic Cloud using Elasticsearch 6 or 7.

Configuration

Caution

As Graylog has switched from an embedded Elasticsearch node client to a lightweight HTTP client in version 2.3, please check the upgrade notes on how to migrate your configuration if you are switching from an earlier version.

Graylog

The most important setting to make a successful connection is a list of comma-separated URIs to one or more Elasticsearch nodes. Graylog needs to know the address of at least one other Elasticsearch node given in the elasticsearch_hostssetting. The specified value should at least contain the scheme (http://for unencrypted, https://for encrypted connections), the hostname or IP and the port of the HTTP listener of this node (which is 9200unless otherwise configured). Optionally, you can also specify an authentication section containing a user name and a password, if either of your Elasticsearch nodes use Shield/X-Pack or Search Guard, or you have an intermediate HTTP proxy requiring authentication between the Graylog server and the Elasticsearch node. Additionally you can specify an optional path prefix at the end of the URI.

A sample specification of elasticsearch_hosts:

elasticsearch_hosts = http://es-node-1.example.org:9200/foo,https://someuser:somepassword@es-node-2.example.org:19200

Caution

Graylog assumes that all nodes in the cluster are running the same versions of Elasticsearch. While it still might work when patch-levels differ, we highly encourage you to keep versions consistent.

Warning

Graylog does not currently react to externally triggered index changes (creating/closing/reopening/deleting an index). These actions need to be performed through the Graylog REST API in order to retain index consistency.

Available Elasticsearch configuration tunables

The following configuration options are now being used to configure connectivity to Elasticsearch:

Config Setting

Type

Comments

Default

elasticsearch_connect_timeout

Duration

Timeout when connection to individual Elasticsearch hosts

10s (10 Seconds)

elasticsearch_hosts

List<URI>

Comma-separated list of URIs of Elasticsearch hosts

http://127.0.0.1:9200

elasticsearch_idle_timeout

Duration

Timeout after which idle connections are terminated

-1s (Never)

elasticsearch_max_total_connections

int

Maximum number of total Elasticsearch connections

20

elasticsearch_max_total_connections_per_route

int

Maximum number of Elasticsearch connections per route/host

2

elasticsearch_socket_timeout

Duration

Timeout when sending/receiving from Elasticsearch connection

60s (60 Seconds)

elasticsearch_discovery_enabled

boolean

Enable automatic Elasticsearch node discovery

false

elasticsearch_discovery_default_user

String

The default username used for authentication for all newly discovered nodes.

empty (no authentication used for discovered nodes)

elasticsearch_discovery_default_password

String

The default password used for authentication for all newly discovered nodes.

empty (no authenticationused for discovered nodes)

elasticsearch_discovery_default_scheme

String

The default scheme used for all newly discovered nodes.

http

elasticsearch_discovery_filter

String

Filter by node attributes for the discovered nodes

empty (use all nodes)

elasticsearch_discovery_frequency

Duration

Frequency of the Elasticsearch node discovery

30s (30 Seconds)

elasticsearch_compression_enabled

boolean

Enable GZIP compression of Elasticseach request payloads

false

elasticsearch_version

String

Major version of the Elasticsearch version used. If not specified, the version will be auto-sensed with the configured nodes. Will disable auto-sensing if specified.

<not set> (auto-sense)

Values: 6 / 7

elasticsearch_mute_deprecation_warnings

boolean

Enable muting of deprecation warnings for deprecated configuration settings in Elasticsearch. These warnings are attached as “Warnings” in HTTP-Response headers and might clutter up the logs. Works only with ES7.

false

elasticsearch_version_probe_attempts

int

Maximum number of retries to connect to Elasticsearch on boot for the version probe before finally giving up. Use 0 to try until a connection can be made.

0 (defaults to try toconnect until a connectioncould be made)

elasticsearch_version_probe_delay

Duration

Waiting time in between connection attempts for elasticsearch_version_probe_attempts

5s (defaults to waitfor 5 seconds between retries)

Automatic version sensing

We support multiple major versions of Elasticsearch (starting with Graylog 4.0) which are partially incompatible with each other (ES6 & ES7). Therefore, we need to know which Elasticsearch version is running in the cluster. This is why we make a single request to the first reachable Elasticsearch node and parse the version of the response it sends back. There are a few things which could go wrong at this point. You might want to run an unsupported version. If you feel comfortable doing so, you can set the elasticsearch_version configuration variable. It will disable auto-sensing and force Graylog to pretend that this Elasticsearch major version is running in the cluster. It will load the corresponding support module.

Note

Elasticsearch 8.0 (which was not yet released at the time this article was written) is not supported by Graylog 4.0. There is a good chance that it will work with our ES7 support, so you can try to set elasticsearch_version = 7to make it run.

Automatic node discovery

Caution

Automatic node discovery does not work when using Amazon Elasticsearch Service because Amazon blocks certain Elasticsearch API endpoints.

Graylog uses automatic node discovery to gather a list of all available Elasticsearch nodes in the cluster at runtime and distributes requests among them to potentially increase their performance and availability. To enable this feature, you need to set the elasticsearch_discovery_enabled to true. Optionally, you can define a filter allowing to selectively include/exclude discovered nodes (details on how to specify node filters are found in the Elasticsearch cluster documentation ) using the elasticsearch_discovery_filtersetting, or by tuning the frequency of the node discovery using the elasticsearch_discovery_frequency configuration option. If your Elasticsearch cluster uses authentication, you need to specify the elasticsearch_discovery_default_userand elasticsearch_discovery_default_passwordsettings. The username/password specified in these settings will be used for all nodes discovered in the cluster. If your cluster uses HTTPS, you also need to set the elasticsearch_discovery_default_schemesetting. It specifies the scheme used for discovered nodes and must be consistent across all nodes in the cluster.

Configuration of Elasticsearch nodes

Control access to Elasticsearch ports

If you are not using Shield/X-Pack or Search Guard to authenticate access to your Elasticsearch nodes, make sure to restrict access to the Elasticsearch ports (default: 9200/tcp and 9300/tcp). Otherwise the data is readable by anyone who has access to the machine over a network.

Open file limits

Because Elasticsearch has to keep a lot of files open simultaneously it requires a higher open file limit than most operating system defaults allow. Set it to at least 64000 open file descriptors.

Graylog will show a notification in the web interface when there is a node in the Elasticsearch cluster which has an open file limit that is too low.

Read about how to raise the open file limit in the corresponding 5.x, 6.x, and 7.x documentation pages.

Heap size

We strongly recommended that you raise the standard size of heap memory allocated to Elasticsearch. For example, set the ES_HEAP_SIZEenvironment variable to 24g to allocate 24GB. We also recommend using around 50% of the available system memory for Elasticsearch (when running on a dedicated host) to leave enough space for the system caches that Elasticsearch uses to a great extent. But please take care that you don’t exceed 32 GB!

Merge throttling

As of ES 6.2 Merge Throttling settings have been deprecated. See: (https://www.elastic.co/guide/en/elasticsearch/reference/6.2/breaking_60_settings_changes.html )

Elasticsearch throttles the merging of Lucene segments to allow extremely fast searches. This throttling however has default values that are very conservative and can lead to slow ingestion rates when used with Graylog. You can see the message journal growing without any real indication of CPU or memory stress on the Elasticsearch nodes. It usually shows up in Elasticsearch INFO log messages like this:

now throttling indexing 

When running on fast IO like SSDs or a SAN we recommend increasing the value of the indices.store.throttle.max_bytes_per_sec in your elasticsearch.yml to 150MB:

indices.store.throttle.max_bytes_per_sec: 150mb

Play around with this setting until you reach the best performance.

Tuning Elasticsearch

Graylog sets specific configurations for every index it manages. This tuning is sufficient for a lot of use cases and setups.

More detailed information about Elasticsearch configurations can be found in the official documentation.

Avoiding split-brain and shard shuffling

Split-brain events

Elasticsearch sacrifices consistency in order to ensure availability and partition tolerance. The reasoning behind this is that short periods of misbehaviour are less problematic than short periods of unavailability. In other words, when Elasticsearch nodes within a cluster are unable to replicate changes to data, they will keep serving applications such as Graylog. When the nodes are able to replicate their data, they will attempt to converge the replicas and achieve eventual consistency .

Elasticsearch tackles the previous by electing master nodes, which are in charge of database operations such as creating new indices, moving shards around the cluster nodes and so forth. Master nodes coordinate their actions actively with others, ensuring that the data can be converged by non-masters. The cluster nodes that are not master nodes are not allowed to make changes that would break the cluster.

The previous mechanism can in some circumstances fail, causing a split-brain event. When an Elasticsearch cluster is split into two sections which work on the data independently, data consistency is lost. As a result nodes will respond differently to the same queries. This is considered a catastrophic event because the data originating from the two masters can not be rejoined automatically and it takes quite a bit of manual work to remedy the situation.

Avoiding split-brain events

Elasticsearch nodes take a simple majority vote over who is master. If the majority agrees on one, then most likely the disconnected minority will give in and everything will be just fine. This mechanism requires that at least 3 nodes work together, merely one or two nodes can not form a majority.

The minimum amount of master nodes required to elect a master must be configured manually in elasticsearch.yml:

Note:

The setting discovery.zen.minimum_master_nodes is ignored if running Elasticsearch versions 7 and up. See the blog on Breaking changes in 7.0 for more details.

# At least NODES/2+1 on clusters with NODES > 2, where NODES is the number of master nodes in the cluster
discovery.zen.minimum_master_nodes: 2

An example of what configuration values should typically be:

Master nodes

minimum_master_nodes

Comments

1

1

2

1

With 2 the other node going down would stop the cluster from working!

3

2

4

3

5

3

6

4

Some of the master nodes may be dedicated master nodes, meaning that they are only configured to handle lightweight operational (cluster management) responsibilities. They will not be able to handle or store any of the cluster’s data. The function of such nodes is similar to so called witness servers on other database products. Setting them up on dedicated witness sites will greatly reduce the risk of Elasticsearch cluster instability.

A dedicated master node has the following configuration in elasticsearch.yml:

node.data: false
node.master: true

Shard shuffling

When the cluster status changes because of a node restart or availability issues, Elasticsearch will start automatically rebalancing the data in the cluster. The cluster works on making sure that the amount of shards and replicas will conform to the cluster configuration. This is a problem if status changes are just temporary. Moving shards and replicas around in the cluster takes up a considerable amount of resources and should be done only when necessary.

Avoiding unnecessary shuffling

Elasticsearch has a couple of configuration options which are designed to allow short times of unavailability before starting the recovery process with shard shuffling. There are 3 settings that may be configured in elasticsearch.yml:

# Recover only after the given number of nodes have joined the cluster. Can be seen as "minimum number of nodes to attempt recovery at all".
gateway.recover_after_nodes: 8
# Time to wait for additional nodes after recover_after_nodes is met.
gateway.recover_after_time: 5m
# Inform ElasticSearch about how many nodes form a full cluster. If this number is met, start up immediately.
gateway.expected_nodes: 10

The configuration options should be set up so that only minimal node unavailability is tolerated. For example server restarts are common and should be managed. The logic is that if you lose large parts of your cluster, you should not tolerate the situation and you probably should start re-shuffling the shards and replicas.

Custom index mappings

Sometimes it’s better not to rely on Elasticsearch’s dynamic mapping. It's better to define a stricter schema for messages.

Note

If the index mapping is conflicting with the actual message to be sent to Elasticsearch, that message will fail to be indexed.

Graylog itself uses a default mapping which includes settings for the timestamp, message, full_message, and sourcefields of indexed messages:

$ curl -X GET 'http://localhost:9200/_template/graylog-internal?pretty'
{
"graylog-internal" : {
  "order" : -1,
  "index_patterns" : [
    "graylog_*"
  ],
  "settings" : {
    "index" : {
      "analysis" : {
        "analyzer" : {
          "analyzer_keyword" : {
            "filter" : "lowercase",
            "tokenizer" : "keyword"
          }
        }
      }
    }
  },
  "mappings" : {
    "message" : {
      "_source" : {
        "enabled" : true
      },
      "dynamic_templates" : [
        {
          "internal_fields" : {
            "mapping" : {
              "type" : "keyword"
            },
            "match_mapping_type" : "string",
            "match" : "gl2_*"
          }
        },
        {
          "store_generic" : {
            "mapping" : {
              "type" : "keyword"
            },
            "match_mapping_type" : "string"
          }
        }
      ],
      "properties" : {
        "gl2_processing_timestamp" : {
          "format" : "yyyy-MM-dd HH:mm:ss.SSS",
          "type" : "date"
        },
        "gl2_accounted_message_size" : {
          "type" : "long"
        },
        "gl2_receive_timestamp" : {
          "format" : "yyyy-MM-dd HH:mm:ss.SSS",
          "type" : "date"
        },
        "full_message" : {
          "fielddata" : false,
          "analyzer" : "standard",
          "type" : "text"
        },
        "streams" : {
          "type" : "keyword"
        },
        "source" : {
          "fielddata" : true,
          "analyzer" : "analyzer_keyword",
          "type" : "text"
        },
        "message" : {
          "fielddata" : false,
          "analyzer" : "standard",
          "type" : "text"
        },
        "timestamp" : {
          "format" : "yyyy-MM-dd HH:mm:ss.SSS",
          "type" : "date"
        }
      }
    }
  },
  "aliases" : { }
}

In order to extend the default mapping of Elasticsearch and Graylog, you can create one or more custom index mappings and add them as index templates to Elasticsearch.

Let’s say we have a schema for our data like the following:

Field Name

Field Type

Example

http_method

keyword

GET

http_response_code

long

200

ingest_time

date

2016-06-13T15:00:51.927Z

took_ms

long

56

This would translate to the following additional index mapping in Elasticsearch:

"mappings" : {
  "message" : {
    "properties" : {
      "http_method" : {
        "type" : "keyword"
      },
      "http_response_code" : {
        "type" : "long"
      },
      "ingest_time" : {
        "type" : "date",
        "format": "strict_date_time"
      },
      "took_ms" : {
        "type" : "long"
      }
    }
  }
}

Formatting the ingest_time field is described in Elasticsearch documentation under format mapping parameter. Also make sure to check Elasticsearch documentation for information on Field datatypes.

When Graylog creates a new index in Elasticsearch, it has to be added to an index template in order to apply additional index mapping. The Graylog default template (graylog-internal) has the lowest priority and Elasticsearch will merge it with the custom index template.

Warning

If default index mapping and custom index mapping cannot be merged (e. g. because of conflicting field datatypes), Elasticsearch will throw an exception and won’t create the index. So be extremely cautious and conservative about the custom index mappings!

Creating a new index template

Save the following index template for the custom index mapping into a file named graylog-custom-mapping.json:

{
  "template": "graylog_*",
  "mappings" : {
    "message" : {
      "properties" : {
        "http_method" : {
          "type" : "keyword"
        },
        "http_response_code" : {
          "type" : "long"
        },
        "ingest_time" : {
          "type" : "date",
          "format": "strict_date_time"
        },
        "took_ms" : {
          "type" : "long"
        }
      }
    }
  }
}
Note

The above template is only compatible with Elasticsearch 6.X. If using Graylog 4.0 with Elasticsearch 7.x, use the template below and save it as graylog-custom-mapping-7x.json.

{
  "template": "graylog_*",
  "mappings": {
    "properties": {
      "http_method": {
        "type": "keyword"
      },
      "http_response_code": {
        "type": "long"
      },
      "ingest_time": {
        "type": "date",
        "format": "strict_date_time"
      },
      "took_ms": {
        "type": "long"
      }
    }
  }
}

Finally, load the index mapping into Elasticsearch with the following command:

$ curl -X PUT -d @'graylog-custom-mapping.json' -H 'Content-Type: application/json' 'http://localhost:9200/_template/graylog-custom-mapping?pretty'
{
  "acknowledged" : true
}

Every Elasticsearch index created thereon, will have an index mapping consisting of the original graylog-internal index template and the new graylog-custom-mapping template:

$ curl -X GET 'http://localhost:9200/graylog_deflector/_mapping?pretty'
{
  "graylog_3" : {
    "mappings" : {
      "message" : {
        "dynamic_templates" : [
          {
            "internal_fields" : {
              "match" : "gl2_*",
              "match_mapping_type" : "string",
              "mapping" : {
                "type" : "keyword"
              }
            }
          },
          {
            "store_generic" : {
              "match_mapping_type" : "string",
              "mapping" : {
                "type" : "keyword"
              }
            }
          }
        ],

        "properties" : {
          "full_message" : {
            "type" : "text",
            "analyzer" : "standard"
          },
          "http_method" : {
            "type" : "keyword"
          },
          "http_response_code" : {
            "type" : "long"
          },
          "ingest_time" : {
            "type" : "date",
            "format" : "strict_date_time"
          },
          "message" : {
            "type" : "text",
            "analyzer" : "standard"
          },
          "source" : {
            "type" : "text",
            "analyzer" : "analyzer_keyword",
            "fielddata" : true
          },
          "streams" : {
            "type" : "keyword"
          },
          "timestamp" : {
            "type" : "date",
            "format" : "yyyy-MM-dd HH:mm:ss.SSS"
          },
          "took_ms" : {
            "type" : "long"
          }
        }
      }
    }
  }
}
Note

When using different index sets each can have its own mapping.

Deleting custom index templates

If you want to remove an existing index template from Elasticsearch, simply issue a DELETE request to Elasticsearch:

$ curl -X DELETE 'http://localhost:9200/_template/graylog-custom-mapping?pretty'
{
  "acknowledged" : true
}

After you’ve removed the index template, new indices will only have the original index mapping:

$ curl -X GET 'http://localhost:9200/graylog_deflector/_mapping?pretty'
{
  "graylog_3" : {
    "mappings" : {
      "message" : {
        "dynamic_templates" : [
          {
            "internal_fields" : {
              "match" : "gl2_*",
              "match_mapping_type" : "string",
              "mapping" : {
                "type" : "keyword"
              }
            }
          },
          {
            "store_generic" : {
              "match_mapping_type" : "string",
              "mapping" : {
                "type" : "keyword"
              }
            }
          }
        ],

        "properties" : {
          "full_message" : {
            "type" : "text",
            "analyzer" : "standard"
          },
          "message" : {
            "type" : "text",
            "analyzer" : "standard"
          },
          "source" : {
            "type" : "text",
            "analyzer" : "analyzer_keyword",
            "fielddata" : true
          },
          "streams" : {
            "type" : "keyword"
          },
          "timestamp" : {
            "type" : "date",
            "format" : "yyyy-MM-dd HH:mm:ss.SSS"
          }
        }
      }
    }
  }
}

Additional information on Elasticsearch Index Templates can be found in the official Elasticsearch Template Documentation.

Note

Settings and index mappings in templates are only applied to new indices. After adding, modifying, or deleting an index template, you have to manually rotate the write-active indices of your index sets for the changes to take effect.

Rotate indices manually

Select the desired index set on the System / Indicespage in the Graylog web interface by clicking on the name of the index set, then select “Rotate active write index” from the “Maintenance” dropdown menu.

rotate_index_1

rotate_index_2

Cluster Status explained

Elasticsearch provides a classification for cluster health.

The cluster status applies to different levels:

  • Shard level - see status descriptions below
  • Index level - inherits the status of the worst shard status
  • Cluster level - inherits the status of the worst index status

That means that the Elasticsearch cluster status will turn red if a single index or shard has problems even though the rest of the indices/shards are okay.

Note

Graylog checks the status of the current write index while indexing messages. If it is GREEN or YELLOW, Graylog will continue to write messages into Elasticsearch regardless of the overall cluster status.

Explanation of different status levels:

RED

The RED status indicates that some or all of the primary shards are not available.

In this state, no searches can be performed until all primary shards have been restored.

YELLOW

The YELLOW status means that all of the primary shards are available but some or all shard replicas are not.

When the index configuration includes replications with a count that is equal or higher than the number of nodes, your cluster cannot become green. In most cases, this can be solved by adding another Elasticsearch node to the cluster or by reducing the replication factor of the indices.

GREEN

The cluster is fully operational. All primary and replica shards are available.


Was this article helpful?

What's Next