Resolve migration failures

edit

Resolve migration failures

edit

Migrating Kibana primarily involves migrating saved object documents to be compatible with the new version.

Saved object migration failures

edit

If Kibana unexpectedly terminates while migrating a saved object index, Kibana automatically attempts to perform the migration again when the process restarts. Do not delete any saved objects indices to fix a failed migration. Unlike previous versions, Kibana 7.12.0 and later does not require deleting indices to release a failed migration lock.

If upgrade migrations fail repeatedly, refer to preparing for migration. When you address the root cause for the migration failure, Kibana automatically retries the migration. If you’re unable to resolve a failed migration, contact Support.

Old .kibana_N indices

edit

After the migrations complete, multiple Kibana indices are created in Elasticsearch: (.kibana_1, .kibana_2, .kibana_7.12.0 etc). Kibana only uses the index that the .kibana and .kibana_task_manager aliases point to. The other Kibana indices can be safely deleted, but are left around as a matter of historical record, and to facilitate rolling Kibana back to a previous version.

Known issues with Fleet beta

edit

If you see a`timeout_exception` or receive_timeout_transport_exception error, it might be from a known known issue in 7.12.0 if you tried the Fleet beta. Upgrade migrations fail because of a large number of documents in the .kibana index, which causes Kibana to log errors such as:

Error: Unable to complete saved object migrations for the [.kibana] index. Please check the health of your Elasticsearch cluster and try again. Error: [receive_timeout_transport_exception]: [instance-0000000002][10.32.1.112:19541][cluster:monitor/task/get] request_id [2648] timed out after [59940ms]

Error: Unable to complete saved object migrations for the [.kibana] index. Please check the health of your Elasticsearch cluster and try again. Error: [timeout_exception]: Timed out waiting for completion of [org.elasticsearch.index.reindex.BulkByScrollTask@6a74c54]

For instructions on how to mitigate the known issue, refer to the GitHub issue.

Corrupt saved objects

edit

To find and remedy problems caused by corrupt documents, we highly recommend testing your Kibana upgrade in a development cluster, especially when there are custom integrations that create saved objects in your environment.

Saved objects that are corrupted through manual editing or integrations cause migration failures with a log message, such as Unable to migrate the corrupt Saved Object document .... For a successful upgrade migration, you must fix or delete corrupt documents.

For example, you receive the following error message:

Unable to migrate the corrupt saved object document with _id: 'marketing_space:dashboard:e3c5fc71-ac71-4805-bcab-2bcc9cc93275'. To allow migrations to proceed, please delete this document from the [.kibana_7.12.0_001] index.

To delete the documents that cause migrations to fail, take the following steps:

  1. Create a role as follows:

    PUT _security/role/grant_kibana_system_indices
    {
      "indices": [
        {
          "names": [
            ".kibana*"
          ],
          "privileges": [
            "all"
          ],
          "allow_restricted_indices": true
        }
      ]
    }
  2. Create a user with the role above and superuser built-in role:

    POST /_security/user/temporarykibanasuperuser
    {
      "password" : "l0ng-r4nd0m-p@ssw0rd",
      "roles" : [ "superuser", "grant_kibana_system_indices" ]
    }
  3. Remove the write block which the migration system has placed on the previous index:

    PUT .kibana_7.12.1_001/_settings
    {
      "index": {
        "blocks.write": false
      }
    }
  4. Delete the corrupt document:

    DELETE .kibana_7.12.0_001/_doc/marketing_space:dashboard:e3c5fc71-ac71-4805-bcab-2bcc9cc93275
  5. Restart Kibana.

    The dashboard with the e3c5fc71-ac71-4805-bcab-2bcc9cc93275 ID that belongs to the marketing_space space is no longer available.

You can configure Kibana to automatically discard all corrupt objects and transform errors that occur during a migration. When setting the configuration option migrations.discardCorruptObjects, Kibana will delete the conflicting objects and proceed with the migration. For instance, for an upgrade to 8.4.0, you can define the following setting in kibana.yml:

migrations.discardCorruptObjects: "8.4.0"

WARNING: Enabling the flag above will cause the system to discard all corrupt objects, as well as those causing transform errors. Thus, it is HIGHLY recommended that you carefully review the list of conflicting objects in the logs.

Documents for unknown saved objects

edit

Migrations will fail if saved objects belong to an unknown saved object type. Unknown saved objects are typically caused by performing manual modifications to the Elasticsearch index (no longer allowed in 8.x), or by disabling a plugin that had previously created a saved object.

We recommend using the Upgrade Assistant to discover and remedy any unknown saved object types. Kibana version 7.17.0 deployments containing unknown saved object types will also log the following warning message:

CHECK_UNKNOWN_DOCUMENTS Upgrades will fail for 8.0+ because documents were found for unknown saved object types. To ensure that future upgrades will succeed, either re-enable plugins or delete these documents from the ".kibana_7.17.0_001" index after the current upgrade completes.

If you fail to remedy this, your upgrade to 8.0+ will fail with a message like:

Unable to complete saved object migrations for the [.kibana] index: Migration failed because some documents were found which use unknown saved object types:
- "firstDocId" (type "someType")
- "secondtDocId" (type "someType")
- "thirdDocId" (type "someOtherType")

To proceed with the migration you can configure Kibana to discard unknown saved objects for this migration.

To proceed with the migration, re-enable any plugins that previously created these saved objects. Alternatively, carefully review the list of unknown saved objects in the Kibana log entry. If the corresponding disabled plugins and their associated saved objects will no longer be used, they can be deleted by setting the configuration option migrations.discardUnknownObjects to the version you are upgrading to. For instance, for an upgrade to 8.4.0, you can define the following setting in kibana.yml:

migrations.discardUnknownObjects: "8.4.0"

Incompatible settings or mappings

edit

Matching index templates that specify settings.refresh_interval or mappings are known to interfere with Kibana upgrades. This can happen when index templates are defined manually.

To make sure the index templates won’t apply to new .kibana* indices, narrow down the data views of any user-defined index templates.

Incompatible xpack.tasks.index configuration setting

edit

In Kibana 7.5.0 and earlier, when the task manager index is set to .tasks with the configuration setting xpack.tasks.index: ".tasks", upgrade migrations fail. In Kibana 7.5.1 and later, the incompatible configuration setting prevents upgrade migrations from starting.

Repeated time-out requests that eventually fail

edit

Migrations get stuck in a loop of retry attempts waiting for index yellow status that’s never reached. In the CLONE_TEMP_TO_TARGET or CREATE_REINDEX_TEMP steps, you might see a log entry similar to:

"Action failed with [index_not_yellow_timeout] Timeout waiting for the status of the [.kibana_8.1.0_001] index to become "yellow". Retrying attempt 1 in 2 seconds."

The process is waiting for a yellow index status. There are two known causes:

Before retrying the migration, inspect the output of the _cluster/allocation/explain?index=${targetIndex} API to identify why the index isn’t yellow:

GET _cluster/allocation/explain
{
  "index": ".kibana_8.1.0_001",
  "shard": 0,
  "primary": true,
}

If the cluster exceeded the low watermark for disk usage, the output should contain a message similar to this:

"The node is above the low watermark cluster setting [cluster.routing.allocation.disk.watermark.low=85%], using more disk space than the maximum allowed [85.0%], actual free: [11.692661332965082%]"

Refer to the Elasticsearch guide for how to fix common cluster issues.

If routing allocation is the issue, the _cluster/allocation/explain API will return an entry similar to this:

"allocate_explanation" : "cannot allocate because allocation is not permitted to any of the nodes"

Routing allocation disabled or restricted

edit

Upgrade migrations fail because routing allocation is disabled or restricted (cluster.routing.allocation.enable: none/primaries/new_primaries), which causes Kibana to log errors such as:

Unable to complete saved object migrations for the [.kibana] index: [incompatible_cluster_routing_allocation] The elasticsearch cluster has cluster routing allocation incorrectly set for migrations to continue. To proceed, please remove the cluster routing allocation settings with PUT /_cluster/settings {"transient": {"cluster.routing.allocation.enable": null}, "persistent": {"cluster.routing.allocation.enable": null}}

To get around the issue, remove the transient and persisted routing allocation settings:

PUT /_cluster/settings
{
  "transient": {
    "cluster.routing.allocation.enable": null
  },
  "persistent": {
    "cluster.routing.allocation.enable": null
  }
}

Elasticsearch cluster shard limit exceeded

edit

When upgrading, Kibana creates new indices requiring a small number of new shards. If the amount of open Elasticsearch shards approaches or exceeds the Elasticsearch cluster.max_shards_per_node setting, Kibana is unable to complete the upgrade. Ensure that Kibana is able to add at least 10 more shards by removing indices to clear up resources, or by increasing the cluster.max_shards_per_node setting.

For more information, refer to the documentation on total shards per node.