Create data frame analytics jobs API
editCreate data frame analytics jobs API
editInstantiates a data frame analytics job.
This functionality is in technical preview and may be changed or removed in a future release. Elastic will work to fix any issues, but features in technical preview are not subject to the support SLA of official GA features.
Request
editPUT _ml/data_frame/analytics/<data_frame_analytics_id>
Prerequisites
edit-
You must have
machine_learning_admin
built-in role to use this API. You must also haveread
andview_index_metadata
privileges on the source index andread
,create_index
, andindex
privileges on the destination index. For more information, see Security privileges and Built-in roles.
Description
editThis API creates a data frame analytics job that performs an analysis on the source index and stores the outcome in a destination index.
The destination index will be automatically created if it does not exist. The
index.number_of_shards
and index.number_of_replicas
settings of the source
index will be copied over the destination index. When the source index matches
multiple indices, these settings will be set to the maximum values found in the
source indices.
The mappings of the source indices are also attempted to be copied over to the destination index, however, if the mappings of any of the fields don’t match among the source indices, the attempt will fail with an error message.
If the destination index already exists, then it will be use as is. This makes it possible to set up the destination index in advance with custom settings and mappings.
Supported fields
editOutlier detection
editOutlier detection requires numeric or boolean data to analyze. The algorithms
don’t support missing values therefore fields that have data types other than
numeric or boolean are ignored. Documents where included fields contain missing
values, null values, or an array are also ignored. Therefore the dest
index
may contain documents that don’t have an outlier score.
Regression
editRegression supports fields that are numeric, boolean, text, keyword and ip. It
is also tolerant of missing values. Fields that are supported are included in
the analysis, other fields are ignored. Documents where included fields contain
an array with two or more values are also ignored. Documents in the dest
index
that don’t contain a results field are not included in the regression analysis.
Path parameters
edit-
<data_frame_analytics_id>
- (Required, string) A numerical character string that uniquely identifies the data frame analytics job. This identifier can contain lowercase alphanumeric characters (a-z and 0-9), hyphens, and underscores. It must start and end with alphanumeric characters.
Request body
edit-
analysis
-
(Required, object) Defines the type of data frame analytics you want to perform on
your source index. For example:
outlier_detection
. See Analysis objects. -
analyzed_fields
-
(Optional, object) You can specify both
includes
and/orexcludes
patterns. Ifanalyzed_fields
is not set, only the relevant fields will be included. For example, all the numeric fields for outlier detection. For the supported field types, see Supported fields. If you specify fields – either inincludes
or inexcludes
– that have a data type that is not supported, an error occurs.-
includes
- (Optional, array) An array of strings that defines the fields that will be included in the analysis.
-
excludes
-
(Optional, array) An array of strings that defines the fields that will be
excluded from the analysis. You do not need to add fields with unsupported
data types to
excludes
, these fields are excluded from the analysis automatically.
-
-
description
- (Optional, string) A description of the job.
-
dest
-
(Required, object) The destination configuration, consisting of
index
and optionallyresults_field
(ml
by default).-
index
- (Required, string) Defines the destination index to store the results of the data frame analytics job.
-
results_field
-
(Optional, string) Defines the name of the field in which to store the
results of the analysis. Default to
ml
.
-
-
model_memory_limit
-
(Optional, string) The approximate maximum amount of memory resources that are
permitted for analytical processing. The default value for data frame analytics jobs
is
1gb
. If yourelasticsearch.yml
file contains anxpack.ml.max_model_memory_limit
setting, an error occurs when you try to create data frame analytics jobs that havemodel_memory_limit
values greater than that setting. For more information, see Machine learning settings. -
source
-
(Required, object) The source configuration, consisting of
index
and optionally aquery
.-
index
- (Required, string or array) Index or indices on which to perform the analysis. It can be a single index or index pattern as well as an array of indices or patterns.
-
query
-
(Optional, object) The Elasticsearch query domain-specific language
(DSL). This value corresponds to the query object in an Elasticsearch
search POST body. All the options that are supported by Elasticsearch can be used,
as this object is passed verbatim to Elasticsearch. By default, this property has
the following value:
{"match_all": {}}
.
-
Examples
editOutlier detection example
editThe following example creates the loganalytics
data frame analytics job, the analysis
type is outlier_detection
:
PUT _ml/data_frame/analytics/loganalytics { "description": "Outlier detection on log data", "source": { "index": "logdata" }, "dest": { "index": "logdata_out" }, "analysis": { "outlier_detection": { } } }
The API returns the following result:
{ "id" : "loganalytics", "description": "Outlier detection on log data", "source" : { "index" : [ "logdata" ], "query" : { "match_all" : { } } }, "dest" : { "index" : "logdata_out", "results_field" : "ml" }, "analysis" : { "outlier_detection" : { } }, "model_memory_limit" : "1gb", "create_time" : 1562351429434, "version" : "7.3.0" }
Regression examples
editThe following example creates the house_price_regression_analysis
data frame analytics job, the analysis type is regression
:
PUT _ml/data_frame/analytics/house_price_regression_analysis { "source": { "index": "houses_sold_last_10_yrs" }, "dest": { "index": "house_price_predictions" }, "analysis": { "regression": { "dependent_variable": "price" } } }
The API returns the following result:
{ "id" : "house_price_regression_analysis", "source" : { "index" : [ "houses_sold_last_10_yrs" ], "query" : { "match_all" : { } } }, "dest" : { "index" : "house_price_predictions", "results_field" : "ml" }, "analysis" : { "regression" : { "dependent_variable" : "price", "training_percent" : 100 } }, "model_memory_limit" : "1gb", "create_time" : 1567168659127, "version" : "8.0.0" }
The following example creates a job and specifies a training percent: