How to Send Data through Logstash or Kafka from Elastic APM
In modern deployments, application servers often have short lifetimes or use less reliable hardware (preemptible/low-priority VMs, spot instances) than other components, like message queues and databases. In those cases, it's best to get data off to a more reliable system quickly.
Starting with the 6.4.0 release, APM Server is able to send data to Logstash or Kafka, offloading the reliability requirements to one of those systems. Using Logstash also introduces an opportunity to enrich your APM data using one of the many Logstash plugins available.
Here we'll step through the configuration and review a few caveats when sending data through another system before ingestion into Elasticsearch.
- Configure Logstash to output to Elasticsearch
- Configure Elastic APM Server to send to Logstash
- Configure Elasticsearch
- Optionally include source maps
- Introduce Kafka
Configure Logstash
First, add a pipeline to configure Logstash to receive events from APM Server, using the beats/lumberjack protocol, and send them on to Elasticsearch.
input { beats { id => "apm-server" port => 5044 } } output { elasticsearch { hosts => ["elasticsearch:9200"] index => "apm-%{[@metadata][version]}-%{[processor][event]}-%{+YYYY.MM.dd}" } }
An index is created per day per event type - transaction, span, etc - just like the default APM Server configuration would generate. Source maps introduce a wrinkle here that we'll cover later in this post.
For those looking closely, you may be wondering what the references to beat are all about. APM Server is built on the [Beats Framework](https://www.elastic.co/products/beats), and as such passes the same metadata around in events. This means that handling APM data in the same Logstash instance as other Beat data can be done using a conditional on [@metadata][beat] == "apm-server"
.
Configure APM Server
Now that Logstash is configured, update APM Server's apm-server.yml
to output events accordingly:
output.elasticsearch: enabled: false output.logstash enabled: true hosts: ["logstash:5044"]
Additional options including backoff, proxy, and ssl settings are detailed in the Elastic APM Server documentation.
If you have monitoring enabled in APM Server, be sure to set:
xpack.monitoring.elasticsearch: enabled: true hosts: ["elasticsearch:9200"]
This will keep monitoring of APM Server itself flowing directly to Elasticsearch.
Configure Elasticsearch
One more step and we'll be ready to start APM Server. Elasticsearch first needs to know how to store the APM data the way the APM UI expects it - it needs an index template. There are two options for loading this template.
If you're able to temporarily point an APM Server at Elasticsearch, this can be done with:
apm-server setup --template
Otherwise, first export the template from APM Server:
apm-server export template > apm-6.4.2.tmpl.json
And then load it into Elasticsearch. With curl, this can be done with:
curl -XPUT -H 'Content-Type: application/json' http://elasticsearch:9200/_template/apm-6.4.2 -d @apm-6.4.2.tmpl.json
Those commands should report success or failure with each operation. To confirm the template was loaded successfully, query for _template/apm-*. This should return a set of documents like:
{ "apm-6.4.2": { "order": 1, "index_patterns": [ "apm-6.4.2-*" ], ...
Notice that the index pattern matches the one configured back in the Logstash step. In addition, notice that it includes version information - that means this setup step should be:
- Performed with the exact same version of APM Server that connects to Logstash.
- Repeated during each upgrade of APM Server.
Use It
Now start up APM Server and start analyzing your application behavior.
Source maps
Source maps are used for mapping obfuscated code back to their original sources. We see them most frequently used for reversing minified Javascript.
Source maps require special consideration when the APM Server output is not Elasticsearch. Regardless of how they eventually reach Elasticsearch, they must be stored there for APM Server to use them.
To configure source mapping storage, set rum.source_mapping.elasticsearch
, like:
apm-server: rum: source_mapping: elasticsearch: hosts: ["elasticsearch:9200"] index_pattern: "apm-*-sourcemap*"
This instructs APM Server to look for source maps in indices matching apm-*-sourcemap*
.
While source maps can be uploaded through Logstash, we recommend sending them directly to Elasticsearch during your deployment process to ensure theyre stored before any events requiring mapping arrive. If that's not an option, the Logstash configuration provided in this post matches the default index pattern used to retrieve source maps when it's time to apply them, so no additional changes are required.
Introducing Kafka to the flow
Kafka can also serve to buffer events output from APM Server for delivery to Elasticsearch. A simple APM Server configuration with Kafka looks like:
output.kafka: enabled: true hosts: ["kafka:9092"] topics: - default: 'apm' topic: 'apm-%{[context.service.name]}'
Using a topic per service is demonstrated here, but not required. Additional configuration options are described in the documentation.
Once events are flowing into Kafka, Logstash can be configured to pull them into Elasticsearch. For example:
input { kafka { id => "apm-server-kafka" bootstrap_servers => ["kafka:9092"] topics_pattern => "apm.*" codec => "json" } } output { elasticsearch { hosts => ["elasticsearch:9200"] index => "apm-%{[@metadata][version]}-%{[processor][event]}-%{+YYYY.MM.dd}" } }
Again, an index is created per day per event type - transaction, span, etc - just like the default APM Server configuration would generate. More options for the Logstash Kafka input plugin are described in the documentation.
Give it a try
APM Server provides some flexibility for shipping your APM data using Logstash or Kafka. Please give it a try and bring your feedback to our discussion forum. We are also always open for contributions, so feel free to check out the source code and open an issue or pull request.