Resolve lifecycle policy execution errors
editResolve lifecycle policy execution errors
editWhen ILM executes a lifecycle policy, it’s possible for errors to occur
while performing the necessary index operations for a step.
When this happens, ILM moves the index to an ERROR
step.
If {ilm-init] cannot resolve the error automatically, execution is halted
until you resolve the underlying issues with the policy, index, or cluster.
For example, you might have a shrink-index
policy that shrinks an index to four shards once it
is at least five days old:
PUT _ilm/policy/shrink-index { "policy": { "phases": { "warm": { "min_age": "5d", "actions": { "shrink": { "number_of_shards": 4 } } } } } }
There is nothing that prevents you from applying the shrink-index
policy to a new
index that has only two shards:
PUT /my-index-000001 { "settings": { "index.number_of_shards": 2, "index.lifecycle.name": "shrink-index" } }
After five days, ILM attempts to shrink my-index-000001
from two shards to four shards.
Because the shrink action cannot increase the number of shards, this operation fails
and ILM moves my-index-000001
to the ERROR
step.
You can use the ILM Explain API to get information about what went wrong:
GET /my-index-000001/_ilm/explain
Which returns the following information:
{ "indices" : { "my-index-000001" : { "index" : "my-index-000001", "managed" : true, "policy" : "shrink-index", "lifecycle_date_millis" : 1541717265865, "age": "5.1d", "phase" : "warm", "phase_time_millis" : 1541717272601, "action" : "shrink", "action_time_millis" : 1541717272601, "step" : "ERROR", "step_time_millis" : 1541717272688, "failed_step" : "shrink", "step_info" : { "type" : "illegal_argument_exception", "reason" : "the number of target shards [4] must be less that the number of source shards [2]" }, "phase_execution" : { "policy" : "shrink-index", "phase_definition" : { "min_age" : "5d", "actions" : { "shrink" : { "number_of_shards" : 4 } } }, "version" : 1, "modified_date_in_millis" : 1541717264230 } } } }
The policy being used to manage the index: |
|
The index age: 5.1 days |
|
The phase the index is currently in: |
|
The current action: |
|
The step the index is currently in: |
|
The step that failed to execute: |
|
The type of error and a description of that error. |
|
The definition of the current phase from the |
To resolve this, you could update the policy to shrink the index to a single shard after 5 days:
PUT _ilm/policy/shrink-index { "policy": { "phases": { "warm": { "min_age": "5d", "actions": { "shrink": { "number_of_shards": 1 } } } } } }
Retrying failed lifecycle policy steps
editOnce you fix the problem that put an index in the ERROR
step,
you might need to explicitly tell ILM to retry the step:
POST /my-index-000001/_ilm/retry
ILM subsequently attempts to re-run the step that failed. You can use the ILM Explain API to monitor the progress.