Monitoring Container Resource Usage with Metricbeat
Metricbeat is a new addition to the Beats lineup for the 5.0 release. It is a lightweight shipper for host and service metrics. Metricbeat is replacing Topbeat in the 5.0 release, and it incorporates all of the metrics provided by Topbeat plus many more.
One of the capabilities of Metricbeat is the ability to collect control group metrics from the Linux kernel. Control groups, which are more commonly referred to as cgroups, are for allocating resources (i.e. cpu, memory) to a process or set of processes and metering resource usage.
This feature is brand new and will be released in Metricbeat 5.0.0-beta1. The feature itself is still evolving and for that reason it is marked as experimental. As we receive feedback there may be enhancements to the metrics it reports.
How do I enable cgroup metrics in Metricbeat?
To enable the cgroup metrics you must add cgroups: true
as part of the system module definition within your Metricbeat configuration file.
metricbeat.modules:
- module: system
metricsets: [process]
cgroups: true
If you are planning to deploy Metricbeat in a container there are some additional configuration items to be aware of. The full details are in the Metricbeat documentation, see the section titled Running Metricbeat in a Container. But in short, you need to mount both the host machine's proc filesystem and the cgroup filesystem inside of the container. Here's an example using Docker (you will need to provide your own container image).
sudo docker run \
--volume=/proc:/hostfs/proc:ro \
--volume=/sys/fs/cgroup:/hostfs/sys/fs/cgroup:ro \
my/metricbeat:latest -system.hostfs=/hostfs
How are cgroups related to container monitoring?
Each container is assigned to a cgroup which allows for limiting and metering resource usage of the process running in the container. We can use this fact to collect detailed metrics from processes running inside containers.
The benefit of collecting container metrics directly from cgroups is that it works with any container tool (e.g. Docker, rkt, runC, LXC, systemd). Metricbeat reads directly from the cgroup pseudo filesystem provided by the Linux kernel, so it has no dependency on the APIs provided by container tools (which are subject to change between releases).
As with any design decision, there is a tradeoff for not using the container tool APIs. Metadata that is only known to the container tool (e.g. names and labels) cannot be read by Metricbeat. We will be addressing this limitation in the future.
What kind of metrics can cgroups provide?
Control groups provide a wealth of information about resource usage. Metricbeat reports both the limits assigned to the cgroup (if any) and the statistics captured by the cgroup. You can view a sample event here. These are the areas that Metricbeat focuses on:
- CPU - There are two cgroups providing CPU metrics. One is aptly called cpu and it limits the amount of CPU time a process can use and reports how much time the process was throttled for. The other cgroup is called cpuacct and it is responsible for reporting the amount of CPU time used.
- Memory - The memory cgroup controls the amount of memory that can be used. It provides very detailed statistics on memory usage.
- Block I/O - The blkio cgroup controls and monitors access to block I/O devices (e.g. disks). It provides information like total number of read and write IO operations and total number of bytes read and written.
- Network - This one isn't implemented yet (see elastic/beats#2483), but it will provide network interface usage statistics for the containerized process. This includes information like total bytes and packets transmitted/received.
For a detailed look at the metrics reported by Metricbeat, have a look at the cgroup field documentation.
Metricbeat sends all of this data as part of the system process metricset. This means that Metricbeat is providing a process centric view versus a container centric view. Metricbeat examines each process and includes the cgroup metrics if the process is a member of a (non-root) cgroup. If you are running one process per container then a process centric view of the data is equivalent to a container centric view. But if you run multiple processes per container then there will be some duplication of the cgroup metrics among the processes running in the same container.
Generally the container ID is used as the name of the cgroup. Metricbeat stores the cgroup name in the system.process.cgroup.id
field. This value can be used to associate the data to a specific container.
If you have any Metricbeat related questions or feature requests, please connect with the Beats engineering team on Elastic forums.