August 11, 2021

Read active log files more quickly and easily with the new filestream input in Filebeat

With Elastic 7.14, the filestream input, the successor of log input, is now generally available in Filebeat. This new, superior input provides better support for reading active log files, with faster reaction time when there is backpressure in the system, quicker registry updates, better cooperation with external log rotation tools, and more.

Improved registry performance

Previously, when a registry file (the file used for saving the progress of publishing events) contained many entries, state updates became slower, even if Filebeat was only collecting a few files. The main problem was that the complete registry was serialized after every ACK from the outputs. That required an expensive fsync call per registry write. Now offset updates are written to the file in an append-only manner. When the log reaches 10 MB, the registry is serialized, and the number of fsync calls is reduced. This makes registry updates more efficient.

In addition, Filebeat was previously only able to clean up states that belonged to inputs. This was problematic because autodiscovery started and stopped new inputs, which opened new files. The states of these files were never removed. So we had to uncouple registry cleaning to remove orphaned states from the registry. Now removing outdated entries does not depend on other parts of the pipeline. When an entry is marked invalid, it will be removed from the registry regardless of backpressure or a presence of an input claiming the file.

Harvester management

Checking for open files that need to be closed now runs in parallel with data collection. This way, the input can stop harvesters even if there is backpressure in the system. Furthermore, registry metadata updates and entry removal no longer depend on the availability of the output, meaning outdated states can be removed from the registry more quickly than before.

Flexible reader pipeline

We’ve also added new features on top of the improvements. Now the ordering of multiline, JSON, and container log parsing is configurable under the option named parsers. Also, we are planning to add a syslog parser so you can extract information from your log files written in syslog format. By making the reader pipeline more flexible, we are able to adopt parsers in all Filebeat inputs. We have already added them to AWS S3 input, and support is coming for more inputs in the future.

Better cooperation with external log rotation tools

We’ve also improved support for external log rotation tools. The improvements described earlier let the filestream input react faster to rotation events when using rename-based rotation strategies. We also introduced a special prospector for copytruncate-based strategies.

Previously, when the input files were rotated with copytruncate, Filebeat was not able to follow the logs. On rotation when the active file was copied, Filebeat processed it as a new log file and shipped the contents to the output, which led to duplicate events. When configured with rotation.external.strategy.copytruncate.* options, the input can now process rotated files correctly and continue forwarding them where it left off previously.

We are planning more enhancements to log rotation in general because file rotation is a significant part of log management.

Other improvements

An additional new feature worth mentioning is include_files, which is the counterpart of exclude_files. We also added more granular support for tailing files. Now you can choose whether you want to tail files since the first start of Filebeat, the last start, or the last configuration change by setting the ignore_inactive option.

We are confident that this new architecture will allow us to improve log collection even further in the future. We look forward to hearing feedback from the community.