Common log collection tools are Logstash, Filebeat, Fluentd, Logagent, rsyslog, etc. What are the differences between them? Which tool should we use in what situations?

Logstash

Logstash is an open source data collection engine with real-time plumbing capabilities. Logstash dynamically consolidates data from different data sources and standardizes the data to a destination of your choice.



advantage

The main thing about Logstash is its flexibility, mainly because it has many plug-ins, detailed documentation, and straightforward configuration formats that allow it to be used in a variety of scenarios. We can basically find a lot of resources on the Internet that can handle almost any problem.

disadvantage

The Logstash killer is its performance and resource consumption (the default heap size is 1GB). Although its performance has improved significantly in recent years, it is still much slower than its replacements. There are Logstash vs. Rsyslog performance comparisons and Logstash vs. FileBeat performance comparisons. It can be a problem in the case of large data volumes.

Another problem is that it does not currently support caching, and the typical alternative is to use Redis or Kafka as the central buffer pool:

Typical Application Scenarios

Because of its flexibility and the wealth of data available on the web, Logstash is ideal for prototype validation, or when parsing is very complex. Regardless of server resources, we can also install Logstash for each server if the performance of the server is good enough. We also don’t need to use buffering because the file has buffering behavior and Logstash remembers where it was last handled.

If the server performance is poor, it is not recommended to install a Logstash for each server. This will require a lightweight log transport tool to transfer data from the server side to Elasticsearch via one or more Logstash central servers:

As your log project progresses, you may need to adjust the log shipping method (log Shipper) for performance or cost reasons. When determining whether Logstash performance is good enough, it is important to have an accurate estimate of throughput requirements, which determines how much hardware resources need to be invested in Logstash.

Filebeat

As a member of the Beats family, Filebeat is a lightweight log transfer tool that makes up for the weakness of Logstash: Filebeat is a lightweight log transfer tool that pushes logs to a central Logstash.



In version 5.x, Elasticsearch has parsing capabilities (like Logstash filters) — Ingest. This means that you can push data directly to Elasticsearch using Filebeat and let Elasticsearch do both parsing and storage. There is no need to use buffering because Filebeat remembers the offset from the last read just like Logstash. If buffering is needed (for example, if you don’t want to fill the file system of the log server), you can use Redis/Kafka because Filebeat can communicate with them.

advantage

Filebeat is just a binary file with no dependencies. It takes very few resources, and although it is still very young, there is very little that can go wrong with it because of its simplicity, so its reliability is very high. It also gives us a number of tuning points, such as how it searches for new files and when it chooses to close the file handle when the file hasn’t changed in a while.

disadvantage

Filebeat has a very limited range of applications, so in some cases we run into problems. For example, if Logstash was used as a downstream conduit, we would also encounter performance issues. Because of this, Filebeat’s scope is growing. At first it could only send logs to Logstash and Elasticsearch, now it can send logs to Kafka and Redis, and in 5.x it has filtering capabilities as well.

Typical Application Scenarios

Filebeat solves certain problems: Logs are stored in files, and we want to transfer logs directly to Elasticsearch. This is only if we just grab (grep) them or if the logs are stored in JSON format (Filebeat can parse JSON). Or if you want to use Elasticsearch’s Ingest feature to parse and enrich the log.

Send logs to Kafka/Redis. So another transport tool (for example, Logstash or custom Kafka consumer) can be further enriched and forwarded. This assumes that the downstream transport tool selected meets our requirements for functionality and performance.

Fluentd

Fluentd was created primarily to use JSON as the log output whenever possible, so the transport tool and its downstream transmission line do not have to guess the types of fields in the substring. As such, it provides libraries for almost any language, which means we can plug it into our own custom programs.



advantage

Like most Logstash plug-ins, the Fluentd plug-in is developed in Ruby and is very easy to write and maintain. So there are a lot of them, and almost all of the source and target stores have plug-ins (with varying levels of maturity). This also means that we can use Fluentd to concatenate everything.

disadvantage

Because Fluentd is used to get structured data in most applications, it is not very flexible. However, we can still use regular expressions to parse unstructured data. Although performance is good in most scenarios, it is not the best. Like syslog-ng, its buffering only exists on the output side, the single-threaded core and plug-ins implemented by Ruby GIL mean that performance is limited under its large nodes. However, its resource consumption is acceptable in most scenarios. For small or embedded devices, you might want to look at the Fluent Bit, which has a similar relationship to Fluentd as Filebeat and Logstash.

Typical Application Scenarios

Fluentd is ideal for logging with a variety of sources and target stores because it has many plug-ins. Also, if most of the data sources are custom applications, you’ll find it much easier to use fluentD’s library than to combine the logging library with other transport tools. Especially if our application is written in multiple languages, and we use multiple logging libraries, the logging behaves differently.

Logagent

Logagent is a transport tool provided by Sematext. It is used to transfer logs to Logsene(a SaaS platform based Elasticsearch API) because Logsene exposes Elasticsearch API. So Logagent can easily push data to Elasticsearch.

advantage

It can get all the information in /var/log, parse various formats (Elasticsearch, Solr, MongoDB, Apache HTTPD, etc.), it can mask sensitive data such as personal verification information (PII), birth date, credit card number, etc. It can also GeoIP rich location information (for example, access logs) based on IP. Again, it’s lightweight and fast enough to fit into any log block. In the new 2.0 version, it adds support for input/output processing plug-ins in a third-party Node.js modular manner. The important thing is that Logagent has local buffering, so unlike Logstash, logs are lost when the data transfer destination is unavailable.

disadvantage

While Logagent has some interesting features (for example, receiving Heroku or CloudFoundry logs), it is not as flexible as Logstash.

Typical Application Scenarios

Logagent is a good choice as a transport tool that can do everything (extract, parse, buffer, and transport).

logtail

Ali cloud log service producer, currently running on the internal machines of Ali Group, after more than 3 years of testing, currently provides ali public cloud users with log collection services.



C++ language implementation, stability, resource control, management, etc., too much effort, good performance. Compared with the community support of LogStash and Fluentd, Logtail is simple and focuses on log collection.

advantage

Logtail occupies the least CPU and memory resources. The E2E experience combined with ali Cloud log service is good.

disadvantage

Logtail currently has weak support for parsing specific log types, which will need to be patched up later.

rsyslog

The syslog daemon by default in most Linux distributions, Rsyslog can do more than read logs from the syslog socket and write them to /var/log/messages. It can extract files, parse, buffer (disk and memory), and transfer them to multiple destinations, including Elasticsearch. You can find out how to work with Apache and system logs here.

advantage

Rsyslog is the fastest transport tool ever tested. If you use it as a simple router/shipper, almost all machines are bandwidth constrained, but it is very good at handling parsing multiple rules. Its grammar-based module (MMNormalize) increases its processing speed linearly no matter how the number of rules increases. This means that when rules are 20-30, such as when parsing Cisco logs, it can outperform regular resolution-based Grok by a factor of 100 (depending on the implementation of Grok and the version of liblognorm).

It was also the lightest parser we could find, depending on the buffering we configured.

disadvantage

The more expensive configuration effort of Rsyslog (here are some examples) makes two things very difficult:

Documentation is difficult to search and read, especially for developers unfamiliar with terminology.

5. X above is not quite the same (it extends the syslogd configuration format while still supporting the old format), although the new format is compatible with the old format, the new features (e.g., Elasticsearch output) only work with the new configuration, and the old plug-ins (e.g., Postgres output) is supported only in older formats.

Although Rsyslog is reliable when configured in a stable way (it itself offers multiple configurations that all end up with the same results), it has some bugs.

Typical Application Scenarios

Rsyslog is suitable for very light applications (applications, small VMS, Docker containers). If you need to work in another transport tool (for example, Logstash), you can either forward JSON directly over TCP or connect the Kafka/Redis buffer.

Rsyslog is also suitable when we are very performance-critical, especially when there are multiple parsing rules. It’s worth investing more time in its configuration.

Feel good please like support, welcome to leave a message or enter my crowd 855801563 to get [architecture data topic collection of 90 issues], [BATJTMD Big factory JAVA interview real questions 1000+], this group is dedicated to learning exchange technology, share interview opportunities, refuse advertising, I will also in the group irregularly answer questions, discussion.