Restart failed anomaly detection jobs

If an anomaly detection job fails, try to restart the job by following the procedure described below. If the restarted job runs as expected, then the problem that caused the job to fail was transient and no further investigation is needed. If the job quickly fails after the restart, then the problem is persistent and needs further investigation. In this case, find out which node the failed job was running on by checking the job stats on the Job management pane in Kibana. Then get the logs for that node and look for exceptions and errors where the ID of the anomaly detection job is in the message to have a better understanding of the issue.

If an anomaly detection job has failed, do the following to recover from failed state:

Force stop the corresponding datafeed by using the Stop datafeed API with the force parameter being true. For example, the following request force stops the my_datafeed datafeed.
```
POST _ml/datafeeds/my_datafeed/_stop
{
  "force": "true"
}
```
Copy as curl Try in Elastic
Force close the anomaly detection job by using the Close anomaly detection job API with the force parameter being true. For example, the following request force closes the my_job anomaly detection job:
```
POST _ml/anomaly_detectors/my_job/_close?force=true
```
Copy as curl Try in Elastic
Restart the anomaly detection job on the Job management pane in Kibana.

« Stop machine learning anomaly detection Working with anomaly detection at scale »

Was this helpful?

Feedback

The Search AI Company

ELK Stack

Elastic Cloud

Generative AI

Search

Security

Observability

By solution

Industries

Customer spotlight

Research

Build

Learn

Connect

Restart failed anomaly detection jobs

Restart failed anomaly detection jobs

Follow us

About us

Join us

Partners

Trust & Security

Investor relations

Excellence Awards

About us

Join us

Partners

Trust & Security

Investor relations

Excellence Awards