Planning Your Log Collection

The process of getting logs into Graylog is essential for security and threat hunting, IT Operations Analysis, reporting, and other use cases.

Log collection can be a daunting task, especially if massive amounts of data needs to be collected. This process could be made easier by previously considering some key points.

What Do You Want to Accomplish with Centralized Log Management?

A business should primarily determine what needs to be accomplished through centralized log management in order to set up Graylog Enterprise in the most effective way.

For example, a system breach is a scenario that companies in every industry face, and the corresponding desired outcome is to prevent such breaches before they happen. Another example is the need for maximum uptime and peak performance while keeping costs manageable. Proactively preventing issues before they arise is also a vital focus point for IT teams.

What Logs Do You Need to Gather?

Knowing what to log is critical to achieving log management goals.  For example, companies rely on a large, complex, and diverse network of hosts with hundreds of servers worldwide. In order to maintain uptime, performance, and security, there is a need to know which logs are critical to these operational functions. One method to determine which logs are needed is to focus on logs from some of the previously mentioned categories to determine if they provide enough data to produce meaningful results.

For example, IT Admins/Ops/SREs may monitor network or hardware performance and real-time application layer monitoring or troubleshooting, DevOps focuses on CI/CD pipelines and build jobs, and SIEMs may monitor user logins to critical resources. In this case, the categories of data from operations, security, and possibly DevOps should be logged.

What Are Your Retention Requirements?

A key question when planning your log management system is log retention referring to how long you need to keep data. The duration depends on several different factors.

Some regulatory frameworks require retention of event log data for a prescribed period. In the absence of a clear requirement, the question becomes one of balancing the cost of retention (storage) with the utility of having historical data.

There is no single answer, as each situation is different. The most crucial issue to keep in mind when designing a retention policy is to provide the flexibility to accommodate different log sources.

Graylog provides two ways to retain event log data:

  1. Online -- stored in Elasticsearch or OpenSearch and searchable through the Graylog GUI

  2. Archived -- stored in a compressed format, either on the Graylog server or on a network file share. These are searchable via GREP, but must be reconstituted in Graylog to be searchable through the GUI again.

Most Graylog customers retain 30-90 days online (searchable in Elasticsearch) and 6-13 months of archives.

What Are Your Storage Requirements?

Like most data stores, Elasticsearch reacts negatively when it consumes all available storage. Proper planning and monitoring are critical in order to prevent such a situation.

There are many variables that affect storage requirements, such as how much of each message is kept, whether the original message is retained once parsing is complete, and how much enrichment is done before storage.

A simple rule of thumb for planning storage is to multiply your average daily ingestion rate by the number of days you need to retain the data online, and then multiply that number by 1.3 to account for metadata overhead. (GB/day x Ret. Days x 1.3 = storage req.).

Elasticsearch makes extensive use of Slack storage space in the course of its operations. Users are strongly encouraged to exceed the minimum storage required for their calculated ingestion rate. When at maximum retention, Elasticsearch storage should not exceed 75% of total space.

Who Will Be Using Graylog?

The number of users is an important factor when it comes to designing a Graylog architecture.

For example, if a company has junior and/or less technical members troubleshooting user-related issues, Graylog makes it easy to empower them through pre-built content such as dashboards. If a distributed IT Operations team needs to query log data simultaneously, this should be considered when designing an architecture.

Another factor is determining access control for different user groups. Security Analysts will require more access than Help Desk employees. Management might have access to everything whereas engineers only have access to test environments. Regardless of the situation, as in all questions of access control, the principle of least privilege should apply.

Choosing Log Event Sources

The selection of event sources should be driven by the use cases that have been identified. For example, if the use case is monitoring of user logins to critical resources, the event sources selected should be only those related to the critical resources in question. This may include the LDAP directory server, local servers, firewalls, network devices, and key applications.

There are many other potential event source categories, including:

  • Security

  • Firewalls

  • Endpoint Security (EDR, AV, etc.)

  • Web Proxies/Gateways

  • LDAP/Active Directory

  • IDS

  • DNS

  • DHCP

  • Servers

  • Workstations

  • Netflow

  • Ops

  • Applications

  • Network Devices

  • Servers

  • Packet Capture/Network Recorder

  • DNS

  • DHCP

  • Email

  • DevOps

  • Application Logs

  • Load Balancer Logs

  • Automation System Logs

  • Business Logic

Collection Methods

A decision must be made as to how the logs will be collected. After a list of event sources have been determined, the next step is to decide the method of collection for each source. It is critical to understand what method each event source uses and what resources may be required.

For example, if a log shipper will be required to read logs from a local file on all servers, the log shipper must be selected and tested before deployment. In other cases, proprietary APIs or software tools must be employed and integrated. In some cases, changes to the event sources themselves (security devices, network hardware, or applications) may be required. Additional planning is often required to deploy and maintain these collection methods over time.

Graylog supports many input types out of the box and many more are available in Graylog Marketplace. A list of input types that Graylog supports can be found in the Getting Started Guide.

A successful, well planned collection of messages is the first step in enabling the user to gain the most benefit from Graylog. For step by step instruction on how to get your logs into Graylog, please see this video.