Resolve lifecycle policy execution errors
editResolve lifecycle policy execution errors
editWhen ILM executes a lifecycle policy, it’s possible for errors to occur
while performing the necessary index operations for a step.
When this happens, ILM moves the index to an ERROR
step.
If ILM cannot resolve the error automatically, execution is halted
until you resolve the underlying issues with the policy, index, or cluster.
For example, you might have a shrink-index
policy that shrinks an index to four shards once it
is at least five days old:
PUT _ilm/policy/shrink-index { "policy": { "phases": { "warm": { "min_age": "5d", "actions": { "shrink": { "number_of_shards": 4 } } } } } }
There is nothing that prevents you from applying the shrink-index
policy to a new
index that has only two shards:
PUT /my-index-000001 { "settings": { "index.number_of_shards": 2, "index.lifecycle.name": "shrink-index" } }
After five days, ILM attempts to shrink my-index-000001
from two shards to four shards.
Because the shrink action cannot increase the number of shards, this operation fails
and ILM moves my-index-000001
to the ERROR
step.
You can use the ILM Explain API to get information about what went wrong:
GET /my-index-000001/_ilm/explain
Which returns the following information:
{ "indices" : { "my-index-000001" : { "index" : "my-index-000001", "managed" : true, "policy" : "shrink-index", "lifecycle_date_millis" : 1541717265865, "age": "5.1d", "phase" : "warm", "phase_time_millis" : 1541717272601, "action" : "shrink", "action_time_millis" : 1541717272601, "step" : "ERROR", "step_time_millis" : 1541717272688, "failed_step" : "shrink", "step_info" : { "type" : "illegal_argument_exception", "reason" : "the number of target shards [4] must be less that the number of source shards [2]" }, "phase_execution" : { "policy" : "shrink-index", "phase_definition" : { "min_age" : "5d", "actions" : { "shrink" : { "number_of_shards" : 4 } } }, "version" : 1, "modified_date_in_millis" : 1541717264230 } } } }
The policy being used to manage the index: |
|
The index age: 5.1 days |
|
The phase the index is currently in: |
|
The current action: |
|
The step the index is currently in: |
|
The step that failed to execute: |
|
The type of error and a description of that error. |
|
The definition of the current phase from the |
To resolve this, you could update the policy to shrink the index to a single shard after 5 days:
PUT _ilm/policy/shrink-index { "policy": { "phases": { "warm": { "min_age": "5d", "actions": { "shrink": { "number_of_shards": 1 } } } } } }
Retrying failed lifecycle policy steps
editOnce you fix the problem that put an index in the ERROR
step,
you might need to explicitly tell ILM to retry the step:
POST /my-index-000001/_ilm/retry
ILM subsequently attempts to re-run the step that failed. You can use the ILM Explain API to monitor the progress.