WARNING: Version 6.2 of the Elastic Stack has passed its EOL date.
This documentation is no longer being maintained and may be removed. If you are running this version, we strongly advise you to upgrade. For the latest information, see the current release documentation.
Datafeeds
editDatafeeds
editMachine learning jobs can analyze data that is stored in Elasticsearch or data that is sent from some other source via an API. Datafeeds retrieve data from Elasticsearch for analysis, which is the simpler and more common scenario.
If you create jobs in Kibana, you must use datafeeds. When you create a job, you select an index pattern and Kibana configures the datafeed for you under the covers. If you use machine learning APIs instead, you can create a datafeed by using the create datafeeds API after you create a job. You can associate only one datafeed with each job.
For a description of all the datafeed properties, see Datafeed Resources.
To start retrieving data from Elasticsearch, you must start the datafeed. When you start it, you can optionally specify start and end times. If you do not specify an end time, the datafeed runs continuously. You can start and stop datafeeds in Kibana or use the start datafeeds and stop datafeeds APIs. A datafeed can be started and stopped multiple times throughout its lifecycle.
When X-Pack security is enabled, a datafeed stores the roles of the user who created or updated the datafeed at that time. This means that if those roles are updated, the datafeed subsequently runs with the new permissions that are associated with the roles. However, if the user’s roles are adjusted after creating or updating the datafeed, the datafeed continues to run with the permissions that were associated with the original roles.
One way to update the roles that are stored within the datafeed without changing any other settings is to submit an empty JSON document ({}) to the update datafeed API.
If the data that you want to analyze is not stored in Elasticsearch, you cannot use datafeeds. You can however send batches of data directly to the job by using the post data to jobs API.