Model snapshots

edit

As described in Analyzing the past and present, Elastic Stack machine learning features can calculate baselines of normal behavior then extrapolate anomalous events. These baselines are accomplished by generating models of your data.

To ensure resilience in the event of a system failure, snapshots of the machine learning model for each anomaly detection job are saved to an internal index within the Elasticsearch cluster. The amount of time necessary to save these snapshots is proportional to the size of the model in memory. By default, snapshots are captured approximately every 3 to 4 hours. You can change this interval (background_persist_interval) when you create or update a job.

To reduce the number of snapshots consuming space on your cluster, at the end of each day, old snapshots are automatically deleted. The age of each snapshot is calculated relative to the timestamp of the most recent snapshot. By default, if there are snapshots over one day older than the newest snapshot, they are deleted except for the first snapshot each day. As well, all snapshots over ten days older than the newest snapshot are deleted. You can change these retention settings (daily_model_snapshot_retention_after_days and model_snapshot_retention_days) when you create or update a job. If you want to exempt a specific snapshot from this clean up, use Kibana or the update model snapshots API to set retain to true.

You can see the list of model snapshots for each job with the get model snapshots API or in the Model snapshots tab on the Job Management page in Kibana:

Example screenshot with a list of model snapshots

There are situations other than system failures where you might want to revert to using a specific model snapshot. The machine learning features react quickly to anomalous input and new behaviors in data. Highly anomalous input increases the variance in the models and machine learning analytics must determine whether it is a new step-change in behavior or a one-off event. In the case where you know this anomalous input is a one-off, it might be appropriate to reset the model state to a time before this event. For example, after a Black Friday sales day you might consider reverting to a saved snapshot. If you know about such events in advance, however, you can use calendars and scheduled events to avoid impacting your model.