Monitoring Petabytes of Logs per Day at eBay with Beats
With 1.2 petabytes of logs per day and 5 million metric data points coming in per second, eBay has no shortage of data to track. Each day, the ecommerce company’s logging and monitoring team is faced with the monumental task of collecting and visualizing all of these logs and metrics. And like most companies, they use a variety of applications (like Hadoop and MySQL) to power different use cases across teams.
When containers popped onto the scene providing a new way of deploying applications, the team started to containerize with Docker and deploy through Kubernetes, using Kubernetes to manage the lifecycle. However, one of the biggest challenges they experienced was that the apps and environment were always evolving — and tough to keep a pulse on. Enter Beats! Filebeat and Metricbeat were the two clear choices to collect and ship logs and metrics from Docker and Kubernetes.
They also wanted to be able to discover workloads as they were created. Before the autodiscover functionality existed in Beats (new in 6.2), they made their own: Collectbeat. Built on top of libbeat, their Beat discovered new pods in Kubernetes clusters. Collectbeat used the Kubernetes API to discover workloads, collect and enrich the data, and then send it to their internal monitoring system. This system, referred to as Sherlock.io, was built to be flexible and adapt to the adoption of new technologies.
Although the collection aspect was now solved, the analytics and visualization pieces still needed to come into focus. Collecting all of the data is only useful if users at eBay are able to analyze them using familiar labels. The next logical step was identifying a way to tag the data with metadata before it shipped. So Vijay Samuel and his team at eBay built a processor called ‘add_kubernetes_metadata’ that takes log messages and metric payloads and appends metadata like name and pod space. This processor is now available on GitHub — and is a great example of why community-driven open source projects are so powerful.
Of course, eBay is still evolving. With the adoption of new technologies come more applications, more logs, and more metrics. In fact, their organic logging growth is 50% YoY. So how do they handle growing amounts of data when resources are finite? One tactic is metering applications at the host/pool level by creating tier-based quota and retention limits. Another is giving priority to specific types of data. Events are the highest priority, metrics showing operational visibility are second, and logs are next. To ensure that the priority is correct, they added event schedulers and allow autodiscover to add weights to the configurations.
Ready to discover more about their team’s tools and strategies? Watch Vijay’s presentation from Elastic{ON} 2018 to learn more about Sherlock.io and how they use Beats to monitor all of the data in their Kubernetes clusters. If you'd like to learn more about how the Elastic Stack can be used to monitor Kubernetes and Docker, check out our Elastic Stack: Monitoring Kubernetes Applications with Beats webinar.