Common problems
editCommon problems
editThis section describes common problems you might encounter when using a Fleet-managed APM Server.
No data is indexed
editIf no data shows up in Elasticsearch, first make sure that your APM components are properly connected.
Is Elastic Agent healthy?
In Kibana open Fleet and find the host that is running the APM integration; confirm that its status is Healthy. If it isn’t, check the Elastic Agent logs to diagnose potential causes. See Monitor Elastic Agents to learn more.
Is APM Server happy?
In Kibana, open Fleet and select the host that is running the APM integration.
Open the Logs tab and select the elastic_agent.apm_server
dataset.
Look for any APM Server errors that could help diagnose the problem.
Can the APM agent connect to APM Server
To determine if the APM agent can connect to the APM Server, send requests to the instrumented service and look for lines
containing [request]
in the APM Server logs.
If no requests are logged, confirm that:
- SSL isn’t misconfigured.
-
The host is correct. For example, if you’re using Docker, ensure a bind to the right interface (for example, set
apm-server.host = 0.0.0.0:8200
to match any IP) and set theSERVER_URL
setting in the APM agent accordingly.
If you see requests coming through the APM Server but they are not accepted (a response code other than 202
),
see APM Server response codes to narrow down the possible causes.
Instrumentation gaps
APM agents provide auto-instrumentation for many popular frameworks and libraries. If the APM agent is not auto-instrumenting something that you were expecting, data won’t be sent to the Elastic Stack. Reference the relevant APM agent documentation for details on what is automatically instrumented.
If no data shows up in Elasticsearch, first check that the APM components are properly connected.
To ensure that APM Server configuration is valid and it can connect to the configured output, Elasticsearch by default, run the following commands:
apm-server test config apm-server test output
To see if the agent can connect to the APM Server, send requests to the instrumented service and look for lines
containing [request]
in the APM Server logs.
If no requests are logged, it might be that SSL is misconfigured or that the host is wrong.
Particularly, if you are using Docker, ensure to bind to the right interface (for example, set
apm-server.host = 0.0.0.0:8200
to match any IP) and set the SERVER_URL
setting in the agent accordingly.
If you see requests coming through the APM Server but they are not accepted (response code other than 202
), consider
the response code to narrow down the possible causes (see sections below).
Another reason for data not showing up is that the agent is not auto-instrumenting something you were expecting, check the agent documentation for details on what is automatically instrumented.
APM Server currently relies on Elasticsearch to create indices that do not exist. As a result, Elasticsearch must be configured to allow automatic index creation for APM indices.
Data is indexed but doesn’t appear in the APM app
editThe APM app relies on index mappings to query and display data. If your APM data isn’t showing up in the APM app, but is elsewhere in Kibana, like the Discover app, you may have a missing index mapping.
You can determine if a field was mapped correctly with the _mapping
API.
For example, run the following command in the Kibana console.
This will display the field data type of the service.name
field.
GET *apm*/_mapping/field/service.name
If the mapping.name.type
is "text"
, your APM indices were not set up correctly.
".ds-metrics-apm.transaction.1m-default-2023.04.12-000038": { "mappings": { "service.name": { "full_name": "service.name", "mapping": { "name": { "type": "text" } } } } }
To fix this problem, install the APM integration by following these steps:
If you have an internet connection
An internet connection is required to install the APM integration via the Fleet UI in Kibana.
- Open Kibana and select Add integrations > Elastic APM.
- Click APM integration.
- Click Add Elastic APM.
- Click Save and continue.
- Click Add Elastic Agent later. You do not need to run an Elastic Agent to complete the setup.
If you don’t have an internet connection
If your environment has network traffic restrictions, there are other ways to install the APM integration. See Air-gapped environments for more information.
-
Option 1: Update
kibana.yml
-
Update
kibana.yml
to include the following, then restart Kibana.xpack.fleet.packages: - name: apm version: latest
See Configure Kibana to learn more about how to edit the Kibana configuration file.
- Option 2: Use the Fleet API
-
Use the Fleet API to install the APM integration. To be successful, this needs to be run against the Kibana API, not the Elasticsearch API.
POST kbn:/api/fleet/epm/packages/apm/8.13.4 { "force": true }
See Kibana API to learn more about how to use the Kibana APIs.
This will reinstall the APM index templates and trigger a data stream index rollover.
You can verify the correct index templates were installed by running the following command in the Kibana console:
GET /_index_template/traces-apm
Common SSL-related problems
editSSL client fails to connect
editThe target host might be unreachable or the certificate may not be valid. To fix this problem:
-
Make sure that the APM Server process on the target host is running and you can connect to it. Try to ping the target host to verify that you can reach it from the host running APM Server. Then use either
nc
ortelnet
to make sure that the port is available. For example:ping <hostname or IP> telnet <hostname or IP> 5044
- Verify that the certificate is valid and that the hostname and IP match.
- Use OpenSSL to test connectivity to the target server and diagnose problems. See the OpenSSL documentation for more info.
x509: cannot validate certificate for <IP address> because it doesn’t contain any IP SANs
editThis happens because your certificate is only valid for the hostname present in the Subject field. To resolve this problem, try one of these solutions:
- Create a DNS entry for the hostname, mapping it to the server’s IP.
-
Create an entry in
/etc/hosts
for the hostname. Or, on Windows, add an entry toC:\Windows\System32\drivers\etc\hosts
. - Re-create the server certificate and add a Subject Alternative Name (SAN) for the IP address of the server. This makes the server’s certificate valid for both the hostname and the IP address.
getsockopt: no route to host
editThis is not an SSL problem. It’s a networking problem. Make sure the two hosts can communicate.
getsockopt: connection refused
editThis is not an SSL problem. Make sure that Logstash is running and that there is no firewall blocking the traffic.
No connection could be made because the target machine actively refused it
editA firewall is refusing the connection. Check if a firewall is blocking the traffic on the client, the network, or the destination host.
I/O Timeout
editI/O Timeouts can occur when your timeout settings across the stack are not configured correctly, especially when using a load balancer.
You may see an error like the one below in the APM agent logs, and/or a similar error on the APM Server side:
[ElasticAPM] APM Server responded with an error: "read tcp 123.34.22.313:8200->123.34.22.40:41602: i/o timeout"
To fix this, ensure timeouts are incrementing from the APM agent, through your load balancer, to the APM Server.
By default, the agent timeouts are set at 10 seconds, and the server timeout is set at 3600 seconds. Your load balancer should be set somewhere between these numbers.
For example:
APM agent --> Load Balancer --> APM Server 10s 15s 3600s
The APM Server timeout can be configured by updating the maximum duration for reading an entire request.
Field limit exceeded
editWhen adding too many distinct tag keys on a transaction or span, you risk creating a mapping explosion.
For example, you should avoid that user-specified data, like URL parameters, is used as a tag key. Likewise, using the current timestamp or a user ID as a tag key is not a good idea. However, tag values with a high cardinality are not a problem. Just try to keep the number of distinct tag keys at a minimum.
The symptom of a mapping explosion is that transactions and spans are not indexed anymore after a certain time. Usually, on the next day, the spans and transactions will be indexed again because a new index is created each day. But as soon as the field limit is reached, indexing stops again.
In the agent logs, you won’t see a sign of failures as the APM server asynchronously sends the data it received from the agents to Elasticsearch. However, the APM server and Elasticsearch log a warning like this:
{\"type\":\"illegal_argument_exception\",\"reason\":\"Limit of total fields [1000] in [INDEX_NAME] has been exceeded\"}
Tail-based sampling causing high system memory usage and high disk IO
editTail-based sampling requires minimal memory to run, and there should not be a noticeable increase in RSS memory usage. However, since tail-based sampling writes data to disk, it is possible to see a significant increase in OS page cache memory usage due to disk IO. If you see a drop in throughput and excessive disk activity after enabling tail-based sampling, please ensure that there is enough memory headroom in the system for OS page cache to perform disk IO efficiently.