Known issues

edit

APM has the following known issues:

Upgrading to v8.15.x may cause ingestion to fail

edit

Elastic Stack versions: 8.15.0+

The issue only occurs when upgrading the Elastic Stack from 8.12.2 or lower directly to any 8.15.x version. The issue does not occur when creating a new cluster using any 8.15.x version, or when upgrading from 8.12.2 to 8.13.x or 8.14.x and then to 8.15.x.

In APM Servers versions prior to 8.13.0, an ingestion pipeline exists to perform a check on the version. The version check would fail any APM document produced with a different version of APM server compared to the version of the installed APM’s ingest pipeline. In 8.13.0 the version check in the ingest pipeline was removed. Due to the combination of an internal change in how apm data management assets are set up from 8.15 onwards and a bug in Elasticsearch, related to lazy rollover of data streams, the ingestion pipeline conducting the version check is not removed on upgrade and prevents the ingestion of data.

If the deployment is running 8.15.0, upgrade the deployment to 8.15.1 or above. A manual rollover of all APM data streams is required to pick up the new index templates and remove the faulty ingest pipeline version check. Perform the following requests to Elasticsearch (they are assuming the default namespace is used, adjust if necessary):

POST /traces-apm-default/_rollover
POST /traces-apm.rum-default/_rollover
POST /logs-apm.error-default/_rollover
POST /logs-apm.app-default/_rollover
POST /metrics-apm.app-default/_rollover
POST /metrics-apm.internal-default/_rollover
POST /metrics-apm.service_destination.1m-default/_rollover
POST /metrics-apm.service_destination.10m-default/_rollover
POST /metrics-apm.service_destination.60m-default/_rollover
POST /metrics-apm.service_summary.1m-default/_rollover
POST /metrics-apm.service_summary.10m-default/_rollover
POST /metrics-apm.service_summary.60m-default/_rollover
POST /metrics-apm.service_transaction.1m-default/_rollover
POST /metrics-apm.service_transaction.10m-default/_rollover
POST /metrics-apm.service_transaction.60m-default/_rollover
POST /metrics-apm.transaction.1m-default/_rollover
POST /metrics-apm.transaction.10m-default/_rollover
POST /metrics-apm.transaction.60m-default/_rollover

Upgrading to v8.15.0 may cause APM indices to lose their lifecycle policy

edit

Elastic Stack versions: 8.15.0
Fixed in Elastic Stack version 8.15.1

The issue only occurs when upgrading the Elastic Stack to 8.15.0. The issue does not occur when creating a new cluster using 8.15.0. The issue also does not occur if a custom ILM policy is configured using a custom component template.

In 8.15.0, APM Server switched to use data stream lifecycle to manage data retention for APM indices for new deployments as well as for upgraded deployments with default lifecycle configurations. Unfortunately, since any data stream created before 8.15.0 does not have a data stream lifecycle configuration, such existing data streams become unmanaged for default lifecycle configurations.

Upgrading to 8.15.1 should fix any new indices created for the data stream. However, indices created in version 8.15.0 would remain unmanaged if the default ILM policy is used. One of the following approaches can be adopted to fix the unmanaged indices:

  1. Manually delete the indices when they are no longer needed.
  2. Explicitly configure APM data streams with the default data stream lifecycle config. Using this approach would migrate all data streams to use data stream lifecycles, which should be equivalent to the default ILM policies:

    PUT _data_stream/traces-apm-*/_lifecycle
    {
      "data_retention": "10d"
    }
    
    PUT _data_stream/traces-apm.rum*/_lifecycle
    {
      "data_retention": "90d"
    }
    
    PUT _data_stream/traces-apm.sampled*/_lifecycle
    {
      "data_retention": "1h"
    }
    
    PUT _data_stream/metrics-apm.*.1m-*/_lifecycle
    {
      "data_retention": "90d"
    }
    
    PUT _data_stream/metrics-apm.*.10m-*/_lifecycle
    {
      "data_retention": "180d"
    }
    
    PUT _data_stream/metrics-apm.*.60m-*/_lifecycle
    {
      "data_retention": "390d"
    }
    
    PUT _data_stream/metrics-apm.internal-*/_lifecycle
    {
      "data_retention": "90d"
    }
    
    PUT _data_stream/metrics-apm.app.*/_lifecycle
    {
      "data_retention": "90d"
    }
    
    PUT _data_stream/logs-apm.*/_lifecycle
    {
      "data_retention": "10d"
    }

This issue is fixed in 8.15.1 (elastic/elasticsearch#112432).

Upgrading to v8.13.0 to v8.13.2 breaks APM anomaly rules

edit

Elastic Stack versions: 8.13.0, 8.13.1, 8.13.2
Fixed in Elastic Stack version 8.13.3

This issue occurs when upgrading the Elastic Stack to version 8.13.0, 8.13.1, or 8.13.2. This issue may go unnoticed unless you actively monitor your Kibana logs. The following log indicates the presence of this issue:

"params invalid: [anomalyDetectorTypes]: expected value of type [array] but got [undefined]"

This issue occurs because a non-optional parameter, anomalyDetectorTypes was added in 8.13.0 without the presence of an automation migration script. This breaks pre-existing rules as they do not have this parameter and will fail validation. This issue is fixed in v8.13.3.

There are three ways to fix this error:

  • Upgrade to version 8.13.3
  • Fix broken anomaly rules in the APM UI (no upgrade required)
  • Fix broken anomaly rules with Kibana APIs (no upgrade required)

Fix broken anomaly rules in the APM UI

  1. From any APM page in Kibana, select Alerts and rulesManage rules.
  2. Filter your rules by setting Type to APM Anomaly.
  3. For each anomaly rule in the list, select the pencil icon to edit the rule.
  4. Add one or more DETECTOR TYPES to the rule.

    The detector type determines when the anomaly rule triggers. For example, a latency anomaly rule will trigger when the latency of the service being monitored is abnormal. Supported detector types are latency, throughput, and failed transaction rate.

  5. Click Save.

Fix broken anomaly rules with Kibana APIs

  1. Find broken rules

    To identify rules in this exact state, you can use the find rules endpoint and search for the APM anomaly rule type as well as this exact error message indicating that the rule is in the broken state. We will also use the fields parameter to specify only the fields required when making the update request later.

    • search_fields=alertTypeId
    • search=apm.anomaly
    • filter=alert.attributes.executionStatus.error.message:"params invalid: [anomalyDetectorTypes]: expected value of type [array] but got [undefined]"
    • fields=[id, name, actions, tags, schedule, notify_when, throttle, params]

    The encoded request might look something like this:

    curl -u "$KIBANA_USER":"$KIBANA_PASSWORD" "$KIBANA_URL/api/alerting/rules/_find?search_fields=alertTypeId&search=apm.anomaly&filter=alert.attributes.executionStatus.error.message%3A%22params%20invalid%3A%20%5BanomalyDetectorTypes%5D%3A%20expected%20value%20of%20type%20%5Barray%5D%20but%20got%20%5Bundefined%5D%22&fields=id&fields=name&fields=actions&fields=tags&fields=schedule&fields=notify_when&fields=throttle&fields=params"
    Example result:
    {
      "page": 1,
      "total": 1,
      "per_page": 10,
      "data": [
        {
          "id": "d85e54de-f96a-49b5-99d4-63956f90a6eb",
          "name": "APM Anomaly Jason Test FAILING [2]",
          "tags": [
            "test",
            "jasonrhodes"
          ],
          "throttle": null,
          "schedule": {
            "interval": "1m"
          },
          "params": {
            "windowSize": 30,
            "windowUnit": "m",
            "anomalySeverityType": "warning",
            "environment": "ENVIRONMENT_ALL"
          },
          "notify_when": null,
          "actions": []
        }
      ]
    }
  2. Prepare the update JSON doc(s)

    For each broken rule found, create a JSON rule document with what was returned from the API in the previous step. You will need to make two changes to each document:

    1. Remove the id key but keep the value connected to this document (e.g. rename the file to {id}.json). The id cannot be sent as part of the request body for the PUT request, but you will need it for the URL path.
    2. Add the "anomalyDetectorTypes" to the "params" block, using the default value as seen below to mimic the pre-8.13 behavior:

      {
        "params": {
          // ... other existing params should stay here,
          // with the required one added to this object
          "anomalyDetectorTypes": [
            "txLatency",
            "txThroughput",
            "txFailureRate"
          ]
        }
      }
  3. Update each rule using the PUT /api/alerting/rule/{id} API

    For each rule, submit a PUT request to the update rule endpoint using that rule’s ID and its stored update document from the previous step. For example, assuming the first broken rule’s ID is 046c0d4f:

    curl -u "$KIBANA_USER":"$KIBANA_PASSWORD" -XPUT "$KIBANA_URL/api/alerting/rule/046c0d4f" -H 'Content-Type: application/json' -H 'kbn-xsrf: rule-update' -d @046c0d4f.json

    Once the PUT request executes successfully, the rule will no longer be broken.

Upgrading APM Server to 8.11+ might break event intake from older APM Java agents

edit

APM Server versions: >=8.11.0
Elastic APM Java agent versions: < 1.43.0

If you are using APM Server (> v8.11.0) and the Elastic APM Java agent (< v1.43.0), the agent may be sending empty histogram metricsets.

In previous APM Server versions some data validation was not properly applied, leading the APM Server to accept empty histogram metricsets where it shouldn’t. This bug was fixed in the APM Server in 8.11.0.

The APM Java agent (< v1.43.0) was sending this kind of invalid data under certain circumstances. If you upgrade the APM Server to v8.11.0+ without upgrading the APM Java agent version, metricsets can be rejected by the APM Server and can result in additional error logs in the Java agent.

The fix is to upgrade the Elastic APM Java agent to a version >= 1.43.0. Find details in elastic/apm-data#157.

traces-apm@custom ingest pipeline applied to certain data streams unintentionally

edit

APM Server versions: 8.12.0

If you’re using the Elastic APM Server v8.12.0, the traces-apm@custom ingest pipeline is now additionally applied to data streams traces-apm.sampled-* and traces-apm.rum-*, and applied twice for traces-apm-*. This bug impacts users with a non-empty traces-apm@custom ingest pipeline.

If you rely on this unintended behavior in 8.12.0, please rename your pipeline to traces-apm.integration@custom to preserve this behavior in later versions.

A fix was released in 8.12.1: elastic/kibana#175448.

Ingesting new JVM metrics in 8.9 and 8.10 breaks upgrade to 8.11 and stops ingestion

edit

APM Server versions: 8.11.0, 8.11.1
Elastic APM Java agent versions: 1.39.0+

If you’re using the Elastic APM Java agent v1.39.0+ to send new JVM metrics to APM Server v8.9.x and v8.10.x, upgrading to 8.11.0 or 8.11.1 will silently fail and stop ingesting APM metrics.

After upgrading, you will see the following errors:

  • APM Server error logs:

    failed to index document in 'metrics-apm.internal-default' (fail_processor_exception): Document produced by APM Server v8.11.1, which is newer than the installed APM integration (v8.10.3-preview-1695284222). The APM integration must be upgraded.
  • Fleet error on integration package upgrade:

    Failed installing package [apm] due to error: [ResponseError: mapper_parsing_exception
    	Root causes:
    		mapper_parsing_exception: Field [jvm.memory.non_heap.pool.committed] attempted to shadow a time_series_metric]

A fix was released in 8.11.2: elastic/kibana#171712.

APM integration package upgrade through Fleet causes excessive data stream rollovers

edit

APM Server versions: <= 8.12.1 +

If you’re upgrading APM integration package to any versions <= 8.12.1, in some rare cases, the upgrade fails with a mapping conflict error. The upgrade process keeps rolling over the data stream in an unsuccessful attempt to work around the error. As a result, many empty backing indices for APM data streams are created.

During upgrade, you will see errors similar to the one below:

  • Fleet error on integration package upgrade:

    Mappings update for metrics-apm.service_destination.10m-default failed due to ResponseError: illegal_argument_exception
    	Root causes:
    		illegal_argument_exception: Mapper for [metricset.interval] conflicts with existing mapper:
    	Cannot update parameter [value] from [10m] to [null]

A fix was released in 8.12.2: elastic/apm-server#12219.

Performance regression: APM issues too many small bulk requests for Elasticsearch output

edit

APM Server versions: >=8.13.0, <= 8.14.2

If you’re on APM server version >=8.13.0, <= 8.14.2_, using Elasticsearch output, do not specify any output.elasticsearch.flush_bytes, and do not disable compression explicitly by setting output.elasticsearch.compression_level to 0, APM server will issue smaller bulk requests of 24KB size, and more bulk requests will need to be made to maintain the original throughput. This causes Elasticsearch to experience higher load, and APM server may exhibit Elasticsearch backpressure symptoms.

This happens because a performance regression was introduced, such that the default value of bulk indexer flush bytes was reduced from 1MB to 24KB.

Affected APM servers will emit the following log:

flush_bytes config value is too small (0) and might be ignored by the indexer, increasing value to 24576

To workaround the issue, modify the Elasticsearch output configuration in APM.

  • For APM Server binary

    • In apm-server.yml, set output.elasticsearch.flush_bytes: 1mib
  • For Fleet-managed APM (non-Elastic Cloud)

    • In Fleet, open the Settings tab.
    • Under Outputs, identify the Elasticsearch output that receives from APM, select the edit icon.
    • In the Edit output flyout, in "Advanced YAML configuration" field, add line flush_bytes: 1mib.
  • For Elastic Cloud

    • It is not possible to edit the Fleet "Elastic Cloud internal output".

A fix will be released in 8.14.3: elastic/apm-server#13576.