Configure the Logstash output
editConfigure the Logstash output
editLogstash allows for additional processing and routing of APM events. The Logstash output sends events directly to Logstash using the lumberjack protocol, which runs over TCP.
Send events to Logstash
editTo send events to Logstash, you must:
Logstash output configuration
editTo enable the Logstash output in APM Server,
edit the apm-server.yml
file to:
- Disable the Elasticsearch output by commenting it out and
-
Enable the Logstash output by uncommenting the Logstash section and setting
enabled
totrue
:
Logstash configuration pipeline
editFinally, you must create a Logstash configuration pipeline that listens for incoming APM Server connections and indexes received events into Elasticsearch.
-
Use the Elastic Agent input plugin to configure Logstash to receive events from the APM Server. A minimal
input
config might look like this:input { elastic_agent { port => 5044 } }
-
Use the Elasticsearch output plugin to send events to Elasticsearch for indexing. A minimal
output
config might look like this:output { elasticsearch { data_stream => "true" cloud_id => "YOUR_CLOUD_ID_HERE" cloud_auth => "YOUR_CLOUD_AUTH_HERE" } }
Enables indexing into Elasticsearch data streams.
This example assumes you’re sending data to Elastic Cloud. If you’re using a self-hosted version of Elasticsearch, use
hosts
instead. See Elasticsearch output plugin for more information.
Here’s what your basic Logstash configuration file will look like when we put everything together:
input { elastic_agent { port => 5044 } } output { elasticsearch { data_stream => "true" cloud_id => "YOUR_CLOUD_ID_HERE" cloud_auth => "YOUR_CLOUD_AUTH_HERE" } }
Accessing the @metadata field
editEvery event sent to Logstash contains a special field called
@metadata
that you can
use in Logstash for conditionals, filtering, indexing and more.
APM Server sends the following @metadata
to Logstash:
To change the default |
|
The current version of APM Server. |
In addition to @metadata
, APM Server provides other potentially useful fields, like the
data_stream
field, which can be used to conditionally operate on
event types, namespaces, or datasets.
As an example, you might want to use Logstash to route all metrics
events to the same custom metrics data stream,
rather than to service-specific data streams.
However, if when you combine all metrics
events there are events that have the data_stream.dataset
field
set to different values, indexing will fail with a message stating that the field does not accept any other values.
For example, the error might say something like failed to parse field [data_stream.dataset] of type [constant_keyword]
or [constant_keyword] field [data_stream.dataset] only accepts values that are equal to the value defined in the mappings
.
This is because the data_stream.dataset
field’s mapping is set to constant_keyword
, which expects all values of the
fields in the index to be the same.
To prevent losing data due to failed indexing, add a Logstash mutate filter
to update the value of data_stream.dataset
. Then, you can send all metrics events to one custom metrics data stream:
filter { if [@metadata][beat] == "apm-server" { if [data_stream][type] == "metrics" { mutate { update => { "[data_stream][dataset]" => "custom" } } } } } output { elasticsearch { data_stream => "true" cloud_id => "${CLOUD_ID}" cloud_auth => "${CLOUD_AUTH}" } }
Only apply this output if the data is being sent from the APM Server. |
|
Determine if the event type is |
|
Add a Logstash mutate filter to update the value of |
|
In this example, |
Compatibility
editThis output works with all compatible versions of Logstash. See the Elastic Support Matrix.
Configuration options
editYou can specify the following options in the logstash
section of the
apm-server.yml
config file:
enabled
The enabled config is a boolean setting to enable or disable the output. If set to false, the output is disabled.
The default value is false
.
hosts
The list of known Logstash servers to connect to. If load balancing is disabled, but multiple hosts are configured, one host is selected randomly (there is no precedence). If one host becomes unreachable, another one is selected randomly.
All entries in this list can contain a port number. The default port number 5044 will be used if no number is given.
compression_level
The gzip compression level. Setting this value to 0 disables compression. The compression level must be in the range of 1 (best speed) to 9 (best compression).
Increasing the compression level will reduce the network usage but will increase the CPU usage.
The default value is 3.
escape_html
Configure escaping of HTML in strings. Set to true
to enable escaping.
The default value is false
.
worker
The number of workers per configured host publishing events to Logstash. This is best used with load balancing mode enabled. Example: If you have 2 hosts and 3 workers, in total 6 workers are started (3 for each host).
loadbalance
If set to true and multiple Logstash hosts are configured, the output plugin load balances published events onto all Logstash hosts. If set to false, the output plugin sends all events to only one host (determined at random) and will switch to another host if the selected one becomes unresponsive. The default value is false.
output.logstash: hosts: ["localhost:5044", "localhost:5045"] loadbalance: true index: apm-server
ttl
Time to live for a connection to Logstash after which the connection will be re-established. Useful when Logstash hosts represent load balancers. Since the connections to Logstash hosts are sticky, operating behind load balancers can lead to uneven load distribution between the instances. Specifying a TTL on the connection allows to achieve equal connection distribution between the instances. Specifying a TTL of 0 will disable this feature.
The default value is 0.
The "ttl" option is not yet supported on an asynchronous Logstash client (one with the "pipelining" option set).
pipelining
Configures the number of batches to be sent asynchronously to Logstash while waiting
for ACK from Logstash. Output only becomes blocking once number of pipelining
batches have been written. Pipelining is disabled if a value of 0 is
configured. The default value is 2.
proxy_url
The URL of the SOCKS5 proxy to use when connecting to the Logstash servers. The
value must be a URL with a scheme of socks5://
. The protocol used to
communicate to Logstash is not based on HTTP so a web-proxy cannot be used.
If the SOCKS5 proxy server requires client authentication, then a username and password can be embedded in the URL as shown in the example.
When using a proxy, hostnames are resolved on the proxy server instead of on the
client. You can change this behavior by setting the
proxy_use_local_resolver
option.
output.logstash: hosts: ["remote-host:5044"] proxy_url: socks5://user:password@socks5-proxy:2233
proxy_use_local_resolver
The proxy_use_local_resolver
option determines if Logstash hostnames are
resolved locally when using a proxy. The default value is false, which means
that when a proxy is used the name resolution occurs on the proxy server.
index
The index root name to write events to. The default is apm-server
. For
example "apm"
generates "[apm-]8.15.5-YYYY.MM.DD"
indices (for example, "apm-8.15.5-2017.04.26"
).
This parameter’s value will be assigned to the metadata.beat
field. It
can then be accessed in Logstash’s output section as %{[@metadata][beat]}
.
ssl
Configuration options for SSL parameters like the root CA for Logstash connections. See SSL/TLS output settings for more information. To use SSL, you must also configure the Beats input plugin for Logstash to use SSL/TLS.
timeout
The number of seconds to wait for responses from the Logstash server before timing out. The default is 30 (seconds).
max_retries
The number of times to retry publishing an event after a publishing failure. After the specified number of retries, the events are typically dropped.
Set max_retries
to a value less than 0 to retry until all events are published.
The default is 3.
bulk_max_size
The maximum number of events to bulk in a single Logstash request. The default is 2048.
If the Beat sends single events, the events are collected into batches. If the Beat publishes
a large batch of events (larger than the value specified by bulk_max_size
), the batch is
split.
Specifying a larger batch size can improve performance by lowering the overhead of sending events. However big batch sizes can also increase processing times, which might result in API errors, killed connections, timed-out publishing requests, and, ultimately, lower throughput.
Setting bulk_max_size
to values less than or equal to 0 disables the
splitting of batches. When splitting is disabled, the queue decides on the
number of events to be contained in a batch.
slow_start
If enabled, only a subset of events in a batch of events is transferred per transaction.
The number of events to be sent increases up to bulk_max_size
if no error is encountered.
On error, the number of events per transaction is reduced again.
The default is false
.
backoff.init
The number of seconds to wait before trying to reconnect to Logstash after
a network error. After waiting backoff.init
seconds, APM Server tries to
reconnect. If the attempt fails, the backoff timer is increased exponentially up
to backoff.max
. After a successful connection, the backoff timer is reset. The
default is 1s
.
backoff.max
The maximum number of seconds to wait before attempting to connect to
Logstash after a network error. The default is 60s
.
Secure communication with Logstash
editYou can use SSL mutual authentication to secure connections between APM Server and Logstash. This ensures that APM Server sends encrypted data to trusted Logstash servers only, and that the Logstash server receives data from trusted APM Server clients only.
To use SSL mutual authentication:
-
Create a certificate authority (CA) and use it to sign the certificates that you plan to use for APM Server and Logstash. Creating a correct SSL/TLS infrastructure is outside the scope of this document. There are many online resources available that describe how to create certificates.
If you are using security features, you can use the
elasticsearch-certutil
tool to generate certificates. -
Configure APM Server to use SSL. In the
apm-server.yml
config file, specify the following settings underssl
:-
certificate_authorities
: Configures APM Server to trust any certificates signed by the specified CA. Ifcertificate_authorities
is empty or not set, the trusted certificate authorities of the host system are used. -
certificate
andkey
: Specifies the certificate and key that APM Server uses to authenticate with Logstash.For example:
output.logstash: hosts: ["logs.mycompany.com:5044"] ssl.certificate_authorities: ["/etc/ca.crt"] ssl.certificate: "/etc/client.crt" ssl.key: "/etc/client.key"
For more information about these configuration options, see SSL/TLS output settings.
-
-
Configure Logstash to use SSL. In the Logstash config file, specify the following settings for the Beats input plugin for Logstash:
-
ssl
: When set to true, enables Logstash to use SSL/TLS. -
ssl_certificate_authorities
: Configures Logstash to trust any certificates signed by the specified CA. -
ssl_certificate
andssl_key
: Specify the certificate and key that Logstash uses to authenticate with the client. -
ssl_verify_mode
: Specifies whether the Logstash server verifies the client certificate against the CA. You need to specify eitherpeer
orforce_peer
to make the server ask for the certificate and validate it. If you specifyforce_peer
, and APM Server doesn’t provide a certificate, the Logstash connection will be closed. If you choose not to usecertutil
, the certificates that you obtain must allow for bothclientAuth
andserverAuth
if the extended key usage extension is present.For example:
input { beats { port => 5044 ssl => true ssl_certificate_authorities => ["/etc/ca.crt"] ssl_certificate => "/etc/server.crt" ssl_key => "/etc/server.key" ssl_verify_mode => "force_peer" } }
For more information about these options, see the documentation for the Beats input plugin.
-
Validate the Logstash server’s certificate
editBefore running APM Server, you should validate the Logstash server’s certificate. You can use curl
to validate the certificate even though the protocol used to communicate with Logstash is not based on HTTP. For example:
curl -v --cacert ca.crt https://logs.mycompany.com:5044
If the test is successful, you’ll receive an empty response error:
* Rebuilt URL to: https://logs.mycompany.com:5044/ * Trying 192.168.99.100... * Connected to logs.mycompany.com (192.168.99.100) port 5044 (#0) * TLS 1.2 connection using TLS_DHE_RSA_WITH_AES_256_CBC_SHA * Server certificate: logs.mycompany.com * Server certificate: mycompany.com > GET / HTTP/1.1 > Host: logs.mycompany.com:5044 > User-Agent: curl/7.43.0 > Accept: */* > * Empty reply from server * Connection #0 to host logs.mycompany.com left intact curl: (52) Empty reply from server
The following example uses the IP address rather than the hostname to validate the certificate:
curl -v --cacert ca.crt https://192.168.99.100:5044
Validation for this test fails because the certificate is not valid for the specified IP address. It’s only valid for the logs.mycompany.com
, the hostname that appears in the Subject field of the certificate.
* Rebuilt URL to: https://192.168.99.100:5044/ * Trying 192.168.99.100... * Connected to 192.168.99.100 (192.168.99.100) port 5044 (#0) * WARNING: using IP address, SNI is being disabled by the OS. * SSL: certificate verification failed (result: 5) * Closing connection 0 curl: (51) SSL: certificate verification failed (result: 5)
See the troubleshooting docs for info about resolving this issue.
Test the APM Server to Logstash connection
editIf you have APM Server running as a service, first stop the service. Then test your setup by running APM Server in the foreground so you can quickly see any errors that occur:
apm-server -c apm-server.yml -e -v
Any errors will be printed to the console. See the troubleshooting docs for info about resolving common errors.