- 17 Aug 2022
- 5 Minutes to read
Planning Your Log Collection
- Updated on 17 Aug 2022
- 5 Minutes to read
Please take a few moments to review this section, and plan your deployment appropriately before you install and configure Graylog. Proper planning will make the difference between a useful solution that meets a variety of stakeholder needs and a complicated drain on resources.
Even in small organizations, modern environments produce a high amount of log data. A normal log volume for small shops a few years ago was 500 MB per day; today, 5 GB per day is a more realistic estimate. A large environment can produce a thousand times more than that.
Assuming an average event size of 500k, 5GB per day equates to 125 log events every second, some 10.8 million events per day. To manage that quantity of data, you need a strategy to manage it effectively. There are two primary approaches.
Collect the bare minimum.
The minimalist strategy proceeds from a “Default No” position to determine event collections. The system will not collect any logs except in identified business use cases. This strategy has some advantages: it minimizes licensing and storage costs because of limited data collection, minimizes the so-called noise produced by extraneous events by allowing analysts to focus on events that have maximum value, and improves system and query efficiency, which improves performance overall.
Collect it all, and let Graylog sort it out.
The maximalist strategy is to collect all events that are produced by any source with the thought that all log data is potentially valuable, especially for forensics. If you choose to collect and keep all logs forever, all logs will be available when you need them. However, this strategy is often not practical due to the cost of technical and human resources needed in the collection, processing, and storage of event data. Storage of excessively large data sets also incur a performance penalty, as well.
What do you want to do with event data?
Use cases inform most decisions during the planning phase, including the determination of event source collections, collection method, how much of each event type to store, how events should be enriched, and how long to retain the data.
Broadly defined, use cases are the technical steps necessary to achieve a technical or business outcome. Simply stated, a use case is a description of what you want to do with an event log once you have collected it. Use cases are often categorized to group similar activities. An operations use case may monitor network or hardware performance, while DevOps use cases may focus on real-time application layer monitoring or troubleshooting.
Event Log Sources
What logs do you need to collect?
You may be uncertain about what logs to collect in an environment where seemingly everything generates event logs. To decide on event sources, start with the use cases you have identified. For example, if you identified the use case monitoring of user logins to critical resources, you will want to collect event source logs from the critical resources in question. Perhaps, LDAP directory server, local servers, firewalls, network devices, and key applications.
Some other potential event sources by category.
- Endpoint Security (EDR, AV, etc.)
- Web Proxies/Gateways
- LDAP/Active Directory
- Network Devices
- Packet Capture/Network Recorder
- Application Logs
- Load Balancer Logs
- Automation System Logs
- Business Logic
How will you collect it?
After you determine a list of event sources, the next step is to determine a collection method for each source. Although many hardware and software products support common methods, such as sending log data via Syslog, many do not, so it is critical that you understand what method each event source uses and what resources they may require. For example, if a log shipper will be required to read logs from a local file on all servers, a log shipper must be selected and tested before deployment. In other cases, proprietary APIs or software tools must be employed and integrated.
In some cases, changes to the event sources themselves (security devices, network hardware or applications) may be required. Additional planning is often required to deploy and maintain these collection methods over time.
Graylog supports many input types out of the box. More inputs are available in the Graylog Marketplace. At the time of writing, Graylog supports the following:
- Syslog (TCP, UDP, AMQP, Kafka)
- GELF (TCP, UDP, AMQP, Kafka, HTTP)
- AWS (AWS Logs, FlowLogs, CloudTrail)
- CEF (TCP, UDP, AMQP, Kafka)
- JSON Path from HTTP API
- Netflow (UDP)
- Plain/Raw Text (TCP, UDP, AMQP, Kafka)
The Graylog Marketplace is the central directory of add-ons for Graylog. Graylog Marketplace contains plugins, content packs, GELF libraries, and more content built by Graylog developers and community members.
Who will use the solution?
Also, consider the user's skill level. Less technical users may require more pre-built content, such as dashboards, and they may require additional training.
You will also need to determine what event sources each user group can access. As in all questions of access control, the principle of least privilege should apply. Some typical user groups include:
- Security Analysts
- Help Desk
How long will you keep the data?
A key question when planning your log management system is log retention. There are two options for retention: online and archived. Online data is stored in Elasticsearch and is searchable through the Graylog GUI. Archived data is stored in a compressed format, either on the Graylog server or on a network file share. Archived data is still searchable, e.g., via GREP, but must be reconstituted in Graylog to be searchable through the GUI again.
Some regulatory frameworks require the retention of event log data for a prescribed period. In the absence of a clear requirement, consider a balanced cost for retention (storage).
Calculating Storage Requirements
Like most data stores, Elasticsearch reacts badly when it consumes all available storage. To prevent this from happening, perform proper planning and monitoring.
Many variables affect storage requirements, such as how much of each message is kept, whether the original message is retained once parsing is complete, and how much enrichment is done before storage.
A simple calculation for anticipating storage needs is your ingestion rate multiplied by the number of days you need to retain the data online, and then multiply that number by 1.3 to account for metadata overhead. (GB/day x Ret. Days x 1.3 = storage req.).
Elasticsearch makes extensive use of slack storage space in the course of its operations. We strongly encourage users to exceed the minimum storage required for their calculated ingestion rate. When at maximum retention, Elasticsearch storage should not exceed 75% of total space.