Common problems

edit

This section describes common problems you might encounter when using APM Server and the Applications UI in Kibana.

APM Server:

Applications UI:

No data is indexed
edit

If no data shows up in Elasticsearch, first make sure that your APM components are properly connected.

Is Elastic Agent healthy?

In Kibana open Fleet and find the host that is running the APM integration; confirm that its status is Healthy. If it isn’t, check the Elastic Agent logs to diagnose potential causes. See Monitor Elastic Agents to learn more.

Is APM Server happy?

In Kibana, open Fleet and select the host that is running the APM integration. Open the Logs tab and select the elastic_agent.apm_server dataset. Look for any APM Server errors that could help diagnose the problem.

Can the APM agent connect to APM Server

To determine if the APM agent can connect to the APM Server, send requests to the instrumented service and look for lines containing [request] in the APM Server logs.

If no requests are logged, confirm that:

  1. SSL isn’t misconfigured.
  2. The host is correct. For example, if you’re using Docker, ensure a bind to the right interface (for example, set apm-server.host = 0.0.0.0:8200 to match any IP) and set the SERVER_URL setting in the APM agent accordingly.

If you see requests coming through the APM Server but they are not accepted (a response code other than 202), see APM Server response codes to narrow down the possible causes.

Instrumentation gaps

APM agents provide auto-instrumentation for many popular frameworks and libraries. If the APM agent is not auto-instrumenting something that you were expecting, data won’t be sent to the Elastic Stack. Reference the relevant APM agent documentation for details on what is automatically instrumented.

Common SSL-related problems
edit
SSL client fails to connectedit

The target host might be unreachable or the certificate may not be valid. To fix this problem:

  1. Make sure that the APM Server process on the target host is running and you can connect to it. Try to ping the target host to verify that you can reach it from the host running APM Server. Then use either nc or telnet to make sure that the port is available. For example:

    ping <hostname or IP>
    telnet <hostname or IP> 5044
  2. Verify that the certificate is valid and that the hostname and IP match.
  3. Use OpenSSL to test connectivity to the target server and diagnose problems. See the OpenSSL documentation for more info.
x509: cannot validate certificate for <IP address> because it doesn’t contain any IP SANsedit

This happens because your certificate is only valid for the hostname present in the Subject field. To resolve this problem, try one of these solutions:

  • Create a DNS entry for the hostname, mapping it to the server’s IP.
  • Create an entry in /etc/hosts for the hostname. Or, on Windows, add an entry to C:\Windows\System32\drivers\etc\hosts.
  • Re-create the server certificate and add a Subject Alternative Name (SAN) for the IP address of the server. This makes the server’s certificate valid for both the hostname and the IP address.
getsockopt: no route to hostedit

This is not an SSL problem. It’s a networking problem. Make sure the two hosts can communicate.

getsockopt: connection refusededit

This is not an SSL problem. Make sure that Logstash is running and that there is no firewall blocking the traffic.

No connection could be made because the target machine actively refused itedit

A firewall is refusing the connection. Check if a firewall is blocking the traffic on the client, the network, or the destination host.

I/O Timeout
edit

I/O Timeouts can occur when your timeout settings across the stack are not configured correctly, especially when using a load balancer.

You may see an error like the one below in the APM agent logs, and/or a similar error on the APM Server side:

[ElasticAPM] APM Server responded with an error:
"read tcp 123.34.22.313:8200->123.34.22.40:41602: i/o timeout"

To fix this, ensure timeouts are incrementing from the APM agent, through your load balancer, to the APM Server.

By default, the agent timeouts are set at 10 seconds, and the server timeout is set at 3600 seconds. Your load balancer should be set somewhere between these numbers.

For example:

APM agent --> Load Balancer  --> APM Server
   10s            15s               3600s

The APM Server timeout can be configured by updating the maximum duration for reading an entire request.

Field limit exceeded
edit

When adding too many distinct tag keys on a transaction or span, you risk creating a mapping explosion.

For example, you should avoid that user-specified data, like URL parameters, is used as a tag key. Likewise, using the current timestamp or a user ID as a tag key is not a good idea. However, tag values with a high cardinality are not a problem. Just try to keep the number of distinct tag keys at a minimum.

The symptom of a mapping explosion is that transactions and spans are not indexed anymore after a certain time. Usually, on the next day, the spans and transactions will be indexed again because a new index is created each day. But as soon as the field limit is reached, indexing stops again.

In the agent logs, you won’t see a sign of failures as the APM server asynchronously sends the data it received from the agents to Elasticsearch. However, the APM server and Elasticsearch log a warning like this:

{\"type\":\"illegal_argument_exception\",\"reason\":\"Limit of total fields [1000] in [INDEX_NAME] has been exceeded\"}
Tail-based sampling causing high system memory usage and high disk IO
edit

Tail-based sampling requires minimal memory to run, and there should not be a noticeable increase in RSS memory usage. However, since tail-based sampling writes data to disk, it is possible to see a significant increase in OS page cache memory usage due to disk IO. If you see a drop in throughput and excessive disk activity after enabling tail-based sampling, please ensure that there is enough memory headroom in the system for OS page cache to perform disk IO efficiently.

Too many unique transaction names
edit

Transaction names are defined in each APM agent; when an APM agent supports a framework, it includes logic for naming the transactions that the framework creates. In some cases though, like when using an APM agent’s API to create custom transactions, it is up to the user to define a pattern for transaction naming. When transactions are named incorrectly, each unique URL can be associated with a unique transaction group—causing an explosion in the number of transaction groups per service, and leading to inaccuracies in the Applications UI.

To fix a large number of unique transaction names, you need to change how you are using the APM agent API to name your transactions. To do this, ensure you are not naming based on parameters that can change. For example, user ids, product ids, order numbers, query parameters, etc., should be stripped away, and commonality should be found between your unique URLs.

Let’s look at an example from the RUM agent documentation. Here are a few URLs you might find on Elastic.co:

// Blog Posts
https://www.elastic.co/blog/reflections-on-three-years-in-the-elastic-public-sector
https://www.elastic.co/blog/say-heya-to-the-elastic-search-awards
https://www.elastic.co/blog/and-the-winner-of-the-elasticon-2018-training-subscription-drawing-is

// Documentation
https://www.elastic.co/guide/en/elastic-stack/current/index.html
https://www.elastic.co/guide/en/apm/get-started/current/index.html
https://www.elastic.co/guide/en/infrastructure/guide/current/index.html

These URLs, like most, include unique names. If we named transactions based on each unique URL, we’d end up with the problem described above—a very large number of different transaction names. Instead, we should strip away the unique information and group our transactions based on common information. In this case, that means naming all blog transactions, /blog, and all documentation transactions, /guide.

If you feel like you’d be losing valuable information by following this naming convention, don’t fret! You can always add additional metadata to your transactions using labels (indexed) or custom context (non-indexed).

After ensuring you’ve correctly named your transactions, you might still see errors in the Applications UI related to transaction group limit reached:

The number of transaction groups has been reached. Current APM server capacity for handling unique transaction groups has been reached. There are at least X transactions missing in this list. Please decrease the number of transaction groups in your service or increase the memory allocated to APM server.

You will see this warning if an agent is creating too many transaction groups. This could indicate incorrect instrumentation which will have to be fixed in your application. Alternatively you can increase the memory of the APM server.

Number of transaction groups exceed the allowed maximum(1,000) that are displayed. The maximum number of transaction groups displayed in Kibana has been reached. Try narrowing down results by using the query bar..

You will see this warning if your results have more than 1000 unique transaction groups. Alternatively you can use the query bar to reduce the number of unique transaction groups in your results.

More information

While this can happen with any APM agent, it typically occurs with the RUM agent. For more information on how to correctly set transaction.name in the RUM agent, see custom initial page load transaction names.

The RUM agent can also set the transaction.name when observing for transaction events. See apm.observe() for more information.

If your problem is occurring in a different APM agent, the tips above still apply. See the relevant Agent API documentation to adjust how you’re naming your transactions.

Unknown route
edit

The transaction overview will only display helpful information when the transactions in your services are named correctly. If you’re seeing "GET unknown route" or "unknown route" in the Applications UI, it could be a sign that something isn’t working as it should.

Elastic APM agents come with built-in support for popular frameworks out-of-the-box. This means, among other things, that the APM agent will try to automatically name HTTP requests. As an example, the Node.js agent uses the route that handled the request, while the Java agent uses the Servlet name.

"Unknown route" indicates that the APM agent can’t determine what to name the request, perhaps because the technology you’re using isn’t supported, the agent has been installed incorrectly, or because something is happening to the request that the agent doesn’t understand.

To resolve this, you’ll need to head over to the relevant APM agent documentation. Specifically, view the agent’s supported technologies page. You can also use the agent’s public API to manually set a name for the transaction.

Fields are not searchable
edit

In Elasticsearch, index templates are used to define settings and mappings that determine how fields should be analyzed. The recommended index templates for APM come from the built-in Elasticsearch apm-data plugin. These templates, by default, enable and disable indexing on certain fields.

As an example, some APM agents store cookie values in http.request.cookies. Since http.request has disabled dynamic indexing, and http.request.cookies is not declared in a custom mapping, the values in http.request.cookies are not indexed and thus not searchable.

Ensure an APM data view exists As a first step, you should ensure the correct data view exists. In Kibana, go to Stack Management > Data views. You should see the APM data view—​the default is traces-apm*,apm-*,logs-apm*,apm-*,metrics-apm*,apm-*. If you don’t, the data view doesn’t exist. To fix this, navigate to the Applications UI in Kibana and select Add data. In the APM tutorial, click Load Kibana objects to create the APM data view.

Ensure a field is searchable There are two things you can do to if you’d like to ensure a field is searchable:

  1. Index your additional data as labels instead. These are dynamic by default, which means they will be indexed and become searchable and aggregatable.
  2. Create a custom mapping for the field.
Service Maps: no connection between client and server
edit

If the service map is not showing an expected connection between the client and server, it’s likely because you haven’t configured distributedTracingOrigins.

This setting is necessary, for example, for cross-origin requests. If you have a basic web application that provides data via an API on localhost:4000, and serves HTML from localhost:4001, you’d need to set distributedTracingOrigins: ['https://localhost:4000'] to ensure the origin is monitored as a part of distributed tracing. In other words, distributedTracingOrigins is consulted prior to the APM agent adding the distributed tracing traceparent header to each request.