IMPORTANT: No additional bug fixes or documentation updates will be released for this version. For the latest information, see the current release documentation.

« Troubleshooting Processing and performance »

›

Common problems

edit

Common problems

edit

This section describes common problems for users running Elastic Agent and the APM integration. If you’re using the standalone (legacy) APM Server binary, see legacy common problems instead.

No data is indexed

edit

If no data shows up in Elasticsearch, first make sure that your APM components are properly connected.

Is Elastic Agent healthy?

In Kibana open Fleet and find the host that is running the APM integration; confirm that its status is Healthy. If it isn’t, check the Elastic Agent logs to diagnose potential causes. See Monitor Elastic Agents to learn more.

Is APM Server happy?

In Kibana, open Fleet and select the host that is running the APM integration. Open the Logs tab and select the elastic_agent.apm_server dataset. Look for any APM Server errors that could help diagnose the problem.

Can the APM agent connect to APM Server

To determine if the APM agent can connect to the APM Server, send requests to the instrumented service and look for lines containing [request] in the APM Server logs.

If no requests are logged, confirm that:

SSL isn’t misconfigured.
The host is correct. For example, if you’re using Docker, ensure a bind to the right interface (for example, set apm-server.host = 0.0.0.0:8200 to match any IP) and set the SERVER_URL setting in the APM agent accordingly.

If you see requests coming through the APM Server but they are not accepted (a response code other than 202), see APM Server response codes to narrow down the possible causes.

Instrumentation gaps

APM agents provide auto-instrumentation for many popular frameworks and libraries. If the APM agent is not auto-instrumenting something that you were expecting, data won’t be sent to the Elastic Stack. Reference the relevant APM agent documentation for details on what is automatically instrumented.

APM Server response codes

edit

HTTP 400: Data decoding error / Data validation error

edit

The most likely cause for this error is using incompatible versions of APM agent and APM Server. See the agent/server compatibility matrix to verify compatibility.

HTTP 400: Event too large

edit

APM agents communicate with the APM server by sending events in an HTTP request. Each event is sent as its own line in the HTTP request body. If events are too large, you should consider increasing the maximum size per event setting in the APM integration, and adjusting relevant settings in the agent.

HTTP 401: Invalid token

edit

Either the Secret token in the request header doesn’t match the secret token configured in the APM integration, or the API keys is invalid.

HTTP 403: Forbidden request

edit

Either you are sending requests to a RUM endpoint without RUM enabled, or a request is coming from an origin not specified in the APM integration settings. See the Allowed origins setting for more information.

HTTP 503: Request timed out waiting to be processed

edit

This happens when APM Server exceeds the maximum number of requests that it can process concurrently. To alleviate this problem, you can try to: reduce the sample rate and/or reduce the collected stack trace information. See Reduce storage for more information.

Another option is to increase processing power. This can be done by either migrating your Elastic Agent to a more powerful machine or adding more APM Server instances.

Common SSL-related problems

edit

SSL client fails to connect

edit

The target host might be unreachable or the certificate may not be valid. To fix this problem:

Make sure that the APM Server process on the target host is running and you can connect to it. Try to ping the target host to verify that you can reach it from the host running APM Server. Then use either nc or telnet to make sure that the port is available. For example:
```
ping <hostname or IP>
telnet <hostname or IP> 5044
```
Verify that the certificate is valid and that the hostname and IP match.
Use OpenSSL to test connectivity to the target server and diagnose problems. See the OpenSSL documentation for more info.

x509: cannot validate certificate for <IP address> because it doesn’t contain any IP SANs

edit

This happens because your certificate is only valid for the hostname present in the Subject field. To resolve this problem, try one of these solutions:

Create a DNS entry for the hostname, mapping it to the server’s IP.
Create an entry in /etc/hosts for the hostname. Or, on Windows, add an entry to C:\Windows\System32\drivers\etc\hosts.
Re-create the server certificate and add a Subject Alternative Name (SAN) for the IP address of the server. This makes the server’s certificate valid for both the hostname and the IP address.

getsockopt: no route to host

edit

This is not an SSL problem. It’s a networking problem. Make sure the two hosts can communicate.

getsockopt: connection refused

edit

This is not an SSL problem. Make sure that Logstash is running and that there is no firewall blocking the traffic.

No connection could be made because the target machine actively refused it

edit

A firewall is refusing the connection. Check if a firewall is blocking the traffic on the client, the network, or the destination host.

I/O Timeout

edit

I/O Timeouts can occur when your timeout settings across the stack are not configured correctly, especially when using a load balancer.

You may see an error like the one below in the APM agent logs, and/or a similar error on the APM Server side:

[ElasticAPM] APM Server responded with an error:
"read tcp 123.34.22.313:8200->123.34.22.40:41602: i/o timeout"

To fix this, ensure timeouts are incrementing from the APM agent, through your load balancer, to the APM Server.

By default, the agent timeouts are set at 10 seconds, and the server timeout is set at 3600 seconds. Your load balancer should be set somewhere between these numbers.

For example:

APM agent --> Load Balancer  --> APM Server
   10s            15s               3600s

The APM Server timeout can be configured by updating the maximum duration for reading an entire request.

What happens when APM Server or Elasticsearch is down?

edit

APM Server does not have an internal queue to buffer requests, but instead leverages an HTTP request timeout to act as back-pressure. If Elasticsearch goes down, the APM Server will eventually deny incoming requests. Both the APM Server and APM agent(s) will issue logs accordingly.

If either Elasticsearch or the APM Server goes down, some APM agents have internal queues or buffers that will temporarily store data. As a general rule of thumb, queues fill up quickly. Assume data will be lost if APM Server or Elasticsearch goes down.

Adjusting APM agent queues/buffers can increase the agent’s overhead, so use caution when updating default values.

Go agent - Circular buffer with configurable size: ELASTIC_APM_BUFFER_SIZE.
Java agent - Internal buffer with configurable size: max_queue_size.
Node.js agent - No internal queue. Data is lost.
PHP agent - No internal queue. Data is lost.
Python agent - Internal Transaction queue with configurable size and time between flushes.
Ruby agent - Internal queue with configurable size: api_buffer_size.
RUM agent - No internal queue. Data is lost.
.NET agent - No internal queue. Data is lost.

« Troubleshooting Processing and performance »