- Observability: other versions:
- Get started
- What is Elastic Observability?
- What’s new in 9.0
- Quickstart: Monitor hosts with Elastic Agent
- Quickstart: Monitor your Kubernetes cluster with Elastic Agent
- Quickstart: Monitor hosts with OpenTelemetry
- Quickstart: Unified Kubernetes Observability with Elastic Distributions of OpenTelemetry (EDOT)
- Quickstart: Collect data with AWS Firehose
- Add data from Splunk
- Applications and services
- Application performance monitoring (APM)
- Get started
- Learn about data types
- Collect application data
- View and analyze data
- Act on data
- Use APM securely
- Manage storage
- Configure APM Server
- Monitor APM Server
- APM APIs
- Troubleshooting
- Upgrade
- Release notes
- APM version HEAD
- APM version 8.17
- APM version 8.16
- APM version 8.15
- APM version 8.14
- APM version 8.13
- APM version 8.12
- APM version 8.11
- APM version 8.10
- APM version 8.9
- APM version 8.8
- APM version 8.7
- APM version 8.6
- APM version 8.5
- APM version 8.4
- APM version 8.3
- APM version 8.2
- APM version 8.1
- APM version 8.0
- Known issues
- Synthetic monitoring
- Get started
- Scripting browser monitors
- Configure lightweight monitors
- Manage monitors
- Work with params and secrets
- Analyze monitor data
- Monitor resources on private networks
- Use the CLI
- Configure projects
- Multi-factor Authentication
- Configure Synthetics settings
- Grant users access to secured resources
- Manage data retention
- Use Synthetics with traffic filters
- Migrate from the Elastic Synthetics integration
- Scale and architect a deployment
- Synthetics support matrix
- Synthetics Encryption and Security
- Troubleshooting
- Real user monitoring
- Uptime monitoring (deprecated)
- Tutorial: Monitor a Java application
- Application performance monitoring (APM)
- CI/CD
- Cloud
- Infrastructure and hosts
- Logs
- Troubleshooting
- Incident management
- Data set quality
- Observability AI Assistant
- Reference
Common problems
editCommon problems
editThis section describes common problems you might encounter when using APM Server and the Applications UI in Kibana.
APM Server:
Applications UI:
No data is indexed
editIf no data shows up in Elasticsearch, first make sure that your APM components are properly connected.
Is Elastic Agent healthy?
In Kibana open Fleet and find the host that is running the APM integration; confirm that its status is Healthy. If it isn’t, check the Elastic Agent logs to diagnose potential causes. See Monitor Elastic Agents to learn more.
Is APM Server happy?
In Kibana, open Fleet and select the host that is running the APM integration.
Open the Logs tab and select the elastic_agent.apm_server
dataset.
Look for any APM Server errors that could help diagnose the problem.
Can the APM agent connect to APM Server
To determine if the APM agent can connect to the APM Server, send requests to the instrumented service and look for lines
containing [request]
in the APM Server logs.
If no requests are logged, confirm that:
- SSL isn’t misconfigured.
-
The host is correct. For example, if you’re using Docker, ensure a bind to the right interface (for example, set
apm-server.host = 0.0.0.0:8200
to match any IP) and set theSERVER_URL
setting in the APM agent accordingly.
If you see requests coming through the APM Server but they are not accepted (a response code other than 202
),
see APM Server response codes to narrow down the possible causes.
Instrumentation gaps
APM agents provide auto-instrumentation for many popular frameworks and libraries. If the APM agent is not auto-instrumenting something that you were expecting, data won’t be sent to the Elastic Stack. Reference the relevant APM agent documentation for details on what is automatically instrumented.
If no data shows up in Elasticsearch, first check that the APM components are properly connected.
To ensure that APM Server configuration is valid and it can connect to the configured output, Elasticsearch by default, run the following commands:
apm-server test config apm-server test output
To see if the agent can connect to the APM Server, send requests to the instrumented service and look for lines
containing [request]
in the APM Server logs.
If no requests are logged, it might be that SSL is misconfigured or that the host is wrong.
Particularly, if you are using Docker, ensure to bind to the right interface (for example, set
apm-server.host = 0.0.0.0:8200
to match any IP) and set the SERVER_URL
setting in the agent accordingly.
If you see requests coming through the APM Server but they are not accepted (response code other than 202
), consider
the response code to narrow down the possible causes (see sections below).
Another reason for data not showing up is that the agent is not auto-instrumenting something you were expecting, check the agent documentation for details on what is automatically instrumented.
APM Server currently relies on Elasticsearch to create indices that do not exist. As a result, Elasticsearch must be configured to allow automatic index creation for APM indices.
Common SSL-related problems
editSSL client fails to connect
editThe target host might be unreachable or the certificate may not be valid. To fix this problem:
-
Make sure that the APM Server process on the target host is running and you can connect to it. Try to ping the target host to verify that you can reach it from the host running APM Server. Then use either
nc
ortelnet
to make sure that the port is available. For example:ping <hostname or IP> telnet <hostname or IP> 5044
- Verify that the certificate is valid and that the hostname and IP match.
- Use OpenSSL to test connectivity to the target server and diagnose problems. See the OpenSSL documentation for more info.
x509: cannot validate certificate for <IP address> because it doesn’t contain any IP SANs
editThis happens because your certificate is only valid for the hostname present in the Subject field. To resolve this problem, try one of these solutions:
- Create a DNS entry for the hostname, mapping it to the server’s IP.
-
Create an entry in
/etc/hosts
for the hostname. Or, on Windows, add an entry toC:\Windows\System32\drivers\etc\hosts
. - Re-create the server certificate and add a Subject Alternative Name (SAN) for the IP address of the server. This makes the server’s certificate valid for both the hostname and the IP address.
getsockopt: no route to host
editThis is not an SSL problem. It’s a networking problem. Make sure the two hosts can communicate.
getsockopt: connection refused
editThis is not an SSL problem. Make sure that Logstash is running and that there is no firewall blocking the traffic.
No connection could be made because the target machine actively refused it
editA firewall is refusing the connection. Check if a firewall is blocking the traffic on the client, the network, or the destination host.
I/O Timeout
editI/O Timeouts can occur when your timeout settings across the stack are not configured correctly, especially when using a load balancer.
You may see an error like the one below in the APM agent logs, and/or a similar error on the APM Server side:
[ElasticAPM] APM Server responded with an error: "read tcp 123.34.22.313:8200->123.34.22.40:41602: i/o timeout"
To fix this, ensure timeouts are incrementing from the APM agent, through your load balancer, to the APM Server.
By default, the agent timeouts are set at 10 seconds, and the server timeout is set at 3600 seconds. Your load balancer should be set somewhere between these numbers.
For example:
APM agent --> Load Balancer --> APM Server 10s 15s 3600s
The APM Server timeout can be configured by updating the maximum duration for reading an entire request.
Field limit exceeded
editWhen adding too many distinct tag keys on a transaction or span, you risk creating a mapping explosion.
For example, you should avoid that user-specified data, like URL parameters, is used as a tag key. Likewise, using the current timestamp or a user ID as a tag key is not a good idea. However, tag values with a high cardinality are not a problem. Just try to keep the number of distinct tag keys at a minimum.
The symptom of a mapping explosion is that transactions and spans are not indexed anymore after a certain time. Usually, on the next day, the spans and transactions will be indexed again because a new index is created each day. But as soon as the field limit is reached, indexing stops again.
In the agent logs, you won’t see a sign of failures as the APM server asynchronously sends the data it received from the agents to Elasticsearch. However, the APM server and Elasticsearch log a warning like this:
{\"type\":\"illegal_argument_exception\",\"reason\":\"Limit of total fields [1000] in [INDEX_NAME] has been exceeded\"}
Tail-based sampling causing high system memory usage and high disk IO
editTail-based sampling requires minimal memory to run, and there should not be a noticeable increase in RSS memory usage. However, since tail-based sampling writes data to disk, it is possible to see a significant increase in OS page cache memory usage due to disk IO. If you see a drop in throughput and excessive disk activity after enabling tail-based sampling, please ensure that there is enough memory headroom in the system for OS page cache to perform disk IO efficiently.
Too many unique transaction names
editTransaction names are defined in each APM agent; when an APM agent supports a framework, it includes logic for naming the transactions that the framework creates. In some cases though, like when using an APM agent’s API to create custom transactions, it is up to the user to define a pattern for transaction naming. When transactions are named incorrectly, each unique URL can be associated with a unique transaction group—causing an explosion in the number of transaction groups per service, and leading to inaccuracies in the Applications UI.
To fix a large number of unique transaction names, you need to change how you are using the APM agent API to name your transactions. To do this, ensure you are not naming based on parameters that can change. For example, user ids, product ids, order numbers, query parameters, etc., should be stripped away, and commonality should be found between your unique URLs.
Let’s look at an example from the RUM agent documentation. Here are a few URLs you might find on Elastic.co:
// Blog Posts https://www.elastic.co/blog/reflections-on-three-years-in-the-elastic-public-sector https://www.elastic.co/blog/say-heya-to-the-elastic-search-awards https://www.elastic.co/blog/and-the-winner-of-the-elasticon-2018-training-subscription-drawing-is // Documentation https://www.elastic.co/guide/en/elastic-stack/current/index.html https://www.elastic.co/guide/en/apm/get-started/current/index.html https://www.elastic.co/guide/en/infrastructure/guide/current/index.html
These URLs, like most, include unique names.
If we named transactions based on each unique URL, we’d end up with the problem described above—a
very large number of different transaction names.
Instead, we should strip away the unique information and group our transactions based on common information.
In this case, that means naming all blog transactions, /blog
, and all documentation transactions, /guide
.
If you feel like you’d be losing valuable information by following this naming convention, don’t fret! You can always add additional metadata to your transactions using labels (indexed) or custom context (non-indexed).
After ensuring you’ve correctly named your transactions, you might still see errors in the Applications UI related to transaction group limit reached:
The number of transaction groups has been reached. Current APM server capacity for handling unique transaction groups has been reached. There are at least X transactions missing in this list. Please decrease the number of transaction groups in your service or increase the memory allocated to APM server.
You will see this warning if an agent is creating too many transaction groups. This could indicate incorrect instrumentation which will have to be fixed in your application. Alternatively you can increase the memory of the APM server.
Number of transaction groups exceed the allowed maximum(1,000) that are displayed. The maximum number of transaction groups displayed in Kibana has been reached. Try narrowing down results by using the query bar..
You will see this warning if your results have more than 1000
unique transaction groups. Alternatively you can use the query bar to reduce the number of unique transaction groups in your results.
More information
While this can happen with any APM agent, it typically occurs with the RUM agent.
For more information on how to correctly set transaction.name
in the RUM agent,
see custom initial page load transaction names.
The RUM agent can also set the transaction.name
when observing for transaction events.
See apm.observe()
for more information.
If your problem is occurring in a different APM agent, the tips above still apply. See the relevant Agent API documentation to adjust how you’re naming your transactions.
Unknown route
editThe transaction overview will only display helpful information when the transactions in your services are named correctly. If you’re seeing "GET unknown route" or "unknown route" in the Applications UI, it could be a sign that something isn’t working as it should.
Elastic APM agents come with built-in support for popular frameworks out-of-the-box. This means, among other things, that the APM agent will try to automatically name HTTP requests. As an example, the Node.js agent uses the route that handled the request, while the Java agent uses the Servlet name.
"Unknown route" indicates that the APM agent can’t determine what to name the request, perhaps because the technology you’re using isn’t supported, the agent has been installed incorrectly, or because something is happening to the request that the agent doesn’t understand.
To resolve this, you’ll need to head over to the relevant APM agent documentation. Specifically, view the agent’s supported technologies page. You can also use the agent’s public API to manually set a name for the transaction.
Fields are not searchable
editIn Elasticsearch, index templates are used to define settings and mappings that determine how fields should be analyzed. The recommended index templates for APM come from the built-in Elasticsearch apm-data plugin. These templates, by default, enable and disable indexing on certain fields.
As an example, some APM agents store cookie values in http.request.cookies
.
Since http.request
has disabled dynamic indexing, and http.request.cookies
is not declared in a custom mapping,
the values in http.request.cookies
are not indexed and thus not searchable.
Ensure an APM data view exists
As a first step, you should ensure the correct data view exists.
In Kibana, go to Stack Management > Data views.
You should see the APM data view—the default is
traces-apm*,apm-*,logs-apm*,apm-*,metrics-apm*,apm-*
.
If you don’t, the data view doesn’t exist.
To fix this, navigate to the Applications UI in Kibana and select Add data.
In the APM tutorial, click Load Kibana objects to create the APM data view.
Ensure a field is searchable There are two things you can do to if you’d like to ensure a field is searchable:
- Index your additional data as labels instead. These are dynamic by default, which means they will be indexed and become searchable and aggregatable.
- Create a custom mapping for the field.
Service Maps: no connection between client and server
editIf the service map is not showing an expected connection between the client and server,
it’s likely because you haven’t configured
distributedTracingOrigins
.
This setting is necessary, for example, for cross-origin requests.
If you have a basic web application that provides data via an API on localhost:4000
,
and serves HTML from localhost:4001
, you’d need to set distributedTracingOrigins: ['https://localhost:4000']
to ensure the origin is monitored as a part of distributed tracing.
In other words, distributedTracingOrigins
is consulted prior to the APM agent adding the
distributed tracing traceparent
header to each request.
On this page
- No data is indexed
- Common SSL-related problems
- SSL client fails to connect
- x509: cannot validate certificate for <IP address> because it doesn’t contain any IP SANs
- getsockopt: no route to host
- getsockopt: connection refused
- No connection could be made because the target machine actively refused it
- I/O Timeout
- Field limit exceeded
- Tail-based sampling causing high system memory usage and high disk IO
- Too many unique transaction names
- Unknown route
- Fields are not searchable
- Service Maps: no connection between client and server