Understand metrics in Filebeat logs
editUnderstand metrics in Filebeat logs
editEvery 30 seconds (by default), Filebeat collects a snapshot of metrics about itself. From this snapshot, Filebeat computes a delta snapshot; this delta snapshot contains any metrics that have changed since the last snapshot. Note that the values of the metrics are the values when the snapshot is taken, NOT the difference in values from the last snapshot.
If this delta snapshot contains any metrics (indicating at least one metric that has changed since the last snapshot), this delta snapshot is serialized as JSON and emitted in Filebeat’s logs at the INFO
log level. Most snapshot fields report the change in the metric since the last snapshot, however some fields are gauges, which always report the current value. Here is an example of such a log entry:
{"log.level":"info","@timestamp":"2023-07-14T12:50:36.811Z","log.logger":"monitoring","log.origin":{"file.name":"log/log.go","file.line":187},"message":"Non-zero metrics in the last 30s","service.name":"filebeat","monitoring":{"metrics":{"beat":{"cgroup":{"memory":{"mem":{"usage":{"bytes":0}}}},"cpu":{"system":{"ticks":692690,"time":{"ms":60}},"total":{"ticks":3167250,"time":{"ms":150},"value":3167250},"user":{"ticks":2474560,"time":{"ms":90}}},"handles":{"limit":{"hard":1048576,"soft":1048576},"open":32},"info":{"ephemeral_id":"2bab8688-34c0-4522-80af-db86948d547d","uptime":{"ms":617670096},"version":"8.6.2"},"memstats":{"gc_next":57189272,"memory_alloc":43589824,"memory_total":275281335792,"rss":183574528},"runtime":{"goroutines":212}},"filebeat":{"events":{"active":5,"added":52,"done":49},"harvester":{"open_files":6,"running":6,"started":1}},"libbeat":{"config":{"module":{"running":15}},"output":{"events":{"acked":48,"active":0,"batches":6,"total":48},"read":{"bytes":210},"write":{"bytes":26923}},"pipeline":{"clients":15,"events":{"active":5,"filtered":1,"published":51,"total":52},"queue":{"max_events":3500,"filled":{"events":5,"bytes":6425,"pct":0.0014},"added":{"events":52,"bytes":65702},"consumed":{"events":52,"bytes":65702},"removed":{"events":48,"bytes":59277},"acked":48}}},"registrar":{"states":{"current":14,"update":49},"writes":{"success":6,"total":6}},"system":{"load":{"1":0.91,"15":0.37,"5":0.4,"norm":{"1":0.1138,"15":0.0463,"5":0.05}}}},"ecs.version":"1.6.0"}}
Details
editFocussing on the .monitoring.metrics
field, and formatting the JSON, it’s value is:
{ "beat": { "cgroup": { "memory": { "mem": { "usage": { "bytes": 0 } } } }, "cpu": { "system": { "ticks": 692690, "time": { "ms": 60 } }, "total": { "ticks": 3167250, "time": { "ms": 150 }, "value": 3167250 }, "user": { "ticks": 2474560, "time": { "ms": 90 } } }, "handles": { "limit": { "hard": 1048576, "soft": 1048576 }, "open": 32 }, "info": { "ephemeral_id": "2bab8688-34c0-4522-80af-db86948d547d", "uptime": { "ms": 617670096 }, "version": "8.6.2" }, "memstats": { "gc_next": 57189272, "memory_alloc": 43589824, "memory_total": 275281335792, "rss": 183574528 }, "runtime": { "goroutines": 212 } }, "filebeat": { "events": { "active": 5, "added": 52, "done": 49 }, "harvester": { "open_files": 6, "running": 6, "started": 1 } }, "libbeat": { "config": { "module": { "running": 15 } }, "output": { "events": { "acked": 48, "active": 0, "batches": 6, "total": 48 }, "read": { "bytes": 210 }, "write": { "bytes": 26923 } }, "pipeline": { "clients": 15, "events": { "active": 5, "filtered": 1, "published": 51, "total": 52 }, "queue": { "max_events": 3500, "filled": { "events": 5, "bytes": 6425, "pct": 0.0014 }, "added": { "events": 52, "bytes": 65702 }, "consumed": { "events": 52, "bytes": 65702 }, "removed": { "events": 48, "bytes": 59277 }, "acked": 48 } } }, "registrar": { "states": { "current": 14, "update": 49 }, "writes": { "success": 6, "total": 6 } }, "system": { "load": { "1": 0.91, "15": 0.37, "5": 0.4, "norm": { "1": 0.1138, "15": 0.0463, "5": 0.05 } } } }
The following tables explain the meaning of the most important fields under .monitoring.metrics
and also provide hints that might be helpful in troubleshooting Filebeat issues.
Field path (relative to .monitoring.metrics ) |
Type | Meaning | Troubleshooting hints |
---|---|---|---|
|
Object |
Information that is common to all Beats, e.g. version, goroutines, file handles, CPU, memory |
|
|
Object |
Information about the publisher pipeline and output, also common to all Beats |
|
|
Object |
Information specific to Filebeat, e.g. harvester, events |
Field path (relative to .monitoring.metrics.beat ) |
Type | Meaning | Troubleshooting hints |
---|---|---|---|
|
Integer |
Number of goroutines running |
If this number grows over time, it indicates a goroutine leak |
Field path (relative to .monitoring.metrics.libbeat ) |
Type | Meaning | Troubleshooting hints |
---|---|---|---|
|
Integer |
Number of events currently in the libbeat publisher pipeline. |
If this number grows over time, it may indicate that Filebeat is producing events faster than the output can consume them. Consider increasing the number of output workers (if this setting is supported by the output; Elasticsearch and Logstash outputs support this setting). The pipeline includes events currently being processed as well as events in the queue. So this metric can sometimes end up slightly higher than the queue size. If this metric reaches the maximum queue size ( |
|
Integer |
Number of events currently being processed by the output. |
If this number grows over time, it may indicate that the output destination (e.g. Logstash pipeline or Elasticsearch cluster) is not able to accept events at the same or faster rate than what Filebeat is sending to it. |
|
Integer |
Number of events acknowledged by the output destination. |
Generally, we want this number to be the same as |
|
Integer |
Number of events that Filebeat tried to send to the output destination, but the destination failed to receive them. |
Generally, we want this field to be absent or its value to be zero. When the value is greater than zero, it’s useful to check Filebeat’s logs right before this log entry’s |
|
Integer |
Number of events that Filebeat gave up sending to the output destination because of a permanent (non-retryable) error. |
|
Integer |
Number of events that Filebeat successfully sent to a configured dead letter index after they failed to ingest in the primary index. |
|
Object |
Field path (relative to .monitoring.metrics.libbeat.pipeline ) |
Type | Meaning | Troubleshooting hints |
---|---|---|---|
|
Integer (gauge) |
The queue’s maximum event count if it has one, otherwise zero. |
|
Integer (gauge) |
The queue’s maximum byte count if it has one, otherwise zero. |
|
Integer (gauge) |
Number of events currently stored by the queue. |
|
Integer (gauge) |
|
Number of bytes currently stored by the queue. |
|
Float (gauge) |
|
How full the queue is relative to its maximum size, as a fraction from 0 to 1. |
Low throughput while |
|
Integer |
Number of events added to the queue by input workers. |
|
Integer |
|
Number of bytes added to the queue by input workers. |
|
Integer |
|
Number of events sent to output workers. |
|
Integer |
|
Number of bytes sent to output workers. |
|
Integer |
|
Number of events removed from the queue after being processed by output workers. |
|
Integer |
When using the memory queue, byte metrics are only set if the output supports them. Currently only the Elasticsearch output supports byte metrics.
Field path (relative to .monitoring.metrics.filebeat ) |
Type | Meaning | Troubleshooting hints |
---|---|---|---|
|
Integer |
Number of events being actively processed by Filebeat (including events Filebeat has already sent to the libbeat publisher pipeline, but not including events the pipeline has sent to the output). |
If this number grows over time, it may indicate that Filebeat inputs are harvesting events too fast for the pipeline and output to keep up. |
Useful commands
editParse monitoring metrics from unstructured Filebeat logs
editFor Filebeat versions that emit unstructured logs, the following script can be used to parse monitoring metrics from such logs: https://github.com/elastic/beats/blob/main/script/metrics_from_log_file.sh.
Check if Filebeat is processing events
edit$ cat beat.log | jq -r '[.["@timestamp"],.monitoring.metrics.filebeat.events.active,.monitoring.metrics.libbeat.pipeline.events.active,.monitoring.metrics.libbeat.output.events.total,.monitoring.metrics.libbeat.output.events.acked,.monitoring.metrics.libbeat.output.events.failed//0] | @tsv' | sort
Example output:
2023-07-14T11:24:36.811Z 1 1 38033 38033 0 2023-07-14T11:25:06.811Z 1 1 17 17 0 2023-07-14T11:25:36.812Z 1 1 16 16 0 2023-07-14T11:26:06.811Z 1 1 17 17 0 2023-07-14T11:26:36.811Z 2 2 21 21 0 2023-07-14T11:27:06.812Z 1 1 18 18 0 2023-07-14T11:27:36.811Z 1 1 17 17 0 2023-07-14T11:28:06.811Z 1 1 18 18 0 2023-07-14T11:28:36.811Z 1 1 16 16 0 2023-07-14T11:37:06.811Z 1 1 270 270 0 2023-07-14T11:37:36.811Z 1 1 16 16 0 2023-07-14T11:38:06.811Z 1 1 17 17 0 2023-07-14T11:38:36.811Z 1 1 16 16 0 2023-07-14T11:41:36.811Z 3 3 323 323 0 2023-07-14T11:42:06.811Z 3 3 17 17 0 2023-07-14T11:42:36.812Z 4 4 18 18 0 2023-07-14T11:43:06.811Z 4 4 17 17 0 2023-07-14T11:43:36.811Z 2 2 17 17 0 2023-07-14T11:47:06.811Z 0 0 117 117 0 2023-07-14T11:47:36.811Z 2 2 14 14 0 2023-07-14T11:48:06.811Z 3 3 17 17 0 2023-07-14T11:48:36.811Z 2 2 17 17 0 2023-07-14T12:49:36.811Z 3 3 2008 1960 48 2023-07-14T12:50:06.812Z 2 2 18 18 0 2023-07-14T12:50:36.811Z 5 5 48 48 0
The columns here are:
-
.@timestamp
-
.monitoring.metrics.filebeat.events.active
-
.monitoring.metrics.libbeat.pipeline.events.active
-
.monitoring.metrics.libbeat.output.events.total
-
.monitoring.metrics.libbeat.output.events.acked
-
.monitoring.metrics.libbeat.output.events.failed