Create inference trained model API
editCreate inference trained model API
editCreates an inference trained model.
This functionality is in technical preview and may be changed or removed in a future release. Elastic will work to fix any issues, but features in technical preview are not subject to the support SLA of official GA features.
Request
editPUT _ml/inference/<model_id>
Prerequisites
editIf the Elasticsearch security features are enabled, you must have the following built-in roles and privileges:
-
machine_learning_admin
For more information, see Security privileges and Built-in roles.
Description
editThe create inference trained model API enables you to supply a trained model that is not created by data frame analytics.
Path parameters
edit-
<model_id>
- (Required, string) The unique identifier of the trained inference model.
Request body
edit-
compressed_definition
-
(Required, string)
The compressed (GZipped and Base64 encoded) inference definition of the model.
If
compressed_definition
is specified, thendefinition
cannot be specified. -
definition
-
(Required, object) The inference definition for the model. If
definition
is specified, thencompressed_definition
cannot be specified.-
definition
.preprocessors
- (Optional, object) Collection of preprocessors. See Inference preprocessor definitions for the full list of available preprocessors.
-
definition
.trained_model
- (Required, object) The definition of the trained model. See Inference trained model definitions for details.
-
-
description
- (Optional, string) A human-readable description of the inference trained model.
-
input
-
(Required, object) The input field names for the model definition.
-
input
.field_names
- (Required, string) An array of input field names for the model.
-
-
metadata
- (Optional, object) An object map that contains metadata about the model.
-
tags
- (Optional, string) An array of tags to organize the model.
Inference preprocessor definitions
edit-
frequency_encoding
-
(Required, object) Defines a frequency encoding for a field.
-
frequency_encoding
.field
- (Required, string) The field name to encode.
-
frequency_encoding
.feature_name
- (Required, string) The name of the resulting feature.
-
frequency_encoding
.frequency_map
- (Required, object map of string:double) Object that maps the field value to the frequency encoded value.
-
-
one_hot_encoding
-
(Required, object) Defines a one hot encoding map for a field.
-
one_hot_encoding
.field
- (Required, string) The field name to encode.
-
one_hot_encoding
.hot_map
- (Required, object map of strings) String map of "field_value: one_hot_column_name".
-
-
target_mean_encoding
-
(Required, object) Defines a target mean encoding for a field.
-
target_mean_encoding
.field
- (Required, string) The field name to encode.
-
target_mean_encoding
.feature_name
- (Required, string) The name of the resulting feature.
-
target_mean_encoding
.target_map
- (Required, object map of string:double) Object that maps the field value to the target mean value.
-
target_mean_encoding
.default_value
-
(Required, double)
The feature value if the field value is not in the
target_map
.
-
See Preprocessor examples for more details.
Inference trained model definitions
edit-
tree
-
(Required, object) The definition for a binary decision tree.
-
tree
.feature_names
- (Required, string) Features expected by the tree, in their expected order.
-
tree
.tree_structure
-
(Required, object)
An array of
tree_node
objects. The nodes must be in ordinal order by theirtree_node.node_index
value. -
tree
.classification_labels
-
(Optional, string) An array of classification labels (used for
classification
). -
tree
.target_type
-
(Required, string)
String indicating the model target type;
regression
orclassification
.
-
There are two major types of nodes: leaf nodes and not-leaf nodes.
-
Leaf nodes only need
node_index
andleaf_value
defined. -
All other nodes need
split_feature
,left_child
,right_child
,threshold
,decision_type
, anddefault_left
defined.-
tree_node
-
(Required, object) The definition of a node in a tree.
-
tree_node
.decision_type
-
(Optional, string)
Indicates the positive value (in other words, when to choose the left node)
decision type. Supported
lt
,lte
,gt
,gte
. Defaults tolte
. -
tree_node
.threshold
- (Optional, double) The decision threshold with which to compare the feature value.
-
tree_node
.left_child
- (Optional, integer) The index of the left child.
-
tree_node
.right_child
- (Optional, integer) The index of the right child.
-
tree_node
.default_left
-
(Optional, boolean)
Indicates whether to default to the left when the feature is missing. Defaults
to
true
. -
tree_node
.split_feature
- (Optional, integer) The index of the feature value in the feature array.
-
tree_node
.node_index
- (Integer) The index of the current node.
-
tree_node
.split_gain
- (Optional, double) The information gain from the split.
-
tree_node
.leaf_value
- (Optional, double) The leaf value of the of the node, if the value is a leaf (in other words, no children).
-
-
ensemble
-
(Optional, object) The definition for an ensemble model.
-
ensemble
.feature_names
- (Optional, string) Features expected by the ensemble, in their expected order.
-
ensemble
.trained_models
-
(Required, object)
An array of
trained_model
objects. Supported trained models aretree
andensemble
. -
ensemble
.classification_labels
- (Optional, string) An array of classification labels.
-
ensemble
.target_type
-
(Required, string)
String indicating the model target type;
regression
orclassification.
-
ensemble
.aggregate_output
-
(Required, object)
An aggregated output object that defines how to aggregate the outputs of the
trained_models
. Supported objects areweighted_mode
,weighted_sum
, andlogistic_regression
.
-
-
See Model examples for more details.
Aggregated output types
edit-
logistic_regression
-
(Optional, object) This
aggregated_output
type works with binary classification (classification for values [0, 1]). It multiplies the outputs (in the case of theensemble
model, the inference model values) by the suppliedweights
. The resulting vector is summed and passed to asigmoid
function. The result of thesigmoid
function is considered the probability of class 1 (P_1
), consequently, the probability of class 0 is1 - P_1
. The class with the highest probability (either 0 or 1) is then returned. For more information about logistic regression, see this wiki article.-
logistic_regression
.weights
- (Required, double) The weights to multiply by the input values (the inference values of the trained models).
-
-
weighted_sum
-
(Optional, object) This
aggregated_output
type works with regression. The weighted sum of the input values.-
weighted_sum
.weights
- (Required, double) The weights to multiply by the input values (the inference values of the trained models).
-
-
weighted_mode
-
(Optional, object) This
aggregated_output
type works with regression or classification. It takes a weighted vote of the input values. The most common input value (taking the weights into account) is returned.-
weighted_mode
.weights
- (Required, double) The weights to multiply by the input values (the inference values of the trained models).
-
See Aggregated output example for more details.
Examples
editPreprocessor examples
editThe example below shows a frequency_encoding
preprocessor object:
{ "frequency_encoding":{ "field":"FlightDelayType", "feature_name":"FlightDelayType_frequency", "frequency_map":{ "Carrier Delay":0.6007414737092798, "NAS Delay":0.6007414737092798, "Weather Delay":0.024573576178086153, "Security Delay":0.02476631010889467, "No Delay":0.6007414737092798, "Late Aircraft Delay":0.6007414737092798 } } }
The next example shows a one_hot_encoding
preprocessor object:
{ "one_hot_encoding":{ "field":"FlightDelayType", "hot_map":{ "Carrier Delay":"FlightDelayType_Carrier Delay", "NAS Delay":"FlightDelayType_NAS Delay", "No Delay":"FlightDelayType_No Delay", "Late Aircraft Delay":"FlightDelayType_Late Aircraft Delay" } } }
This example shows a target_mean_encoding
preprocessor object:
{ "target_mean_encoding":{ "field":"FlightDelayType", "feature_name":"FlightDelayType_targetmean", "target_map":{ "Carrier Delay":39.97465788139886, "NAS Delay":39.97465788139886, "Security Delay":203.171206225681, "Weather Delay":187.64705882352948, "No Delay":39.97465788139886, "Late Aircraft Delay":39.97465788139886 }, "default_value":158.17995752420433 } }
Model examples
editThe first example shows a trained_model
object:
{ "tree":{ "feature_names":[ "DistanceKilometers", "FlightTimeMin", "FlightDelayType_NAS Delay", "Origin_targetmean", "DestRegion_targetmean", "DestCityName_targetmean", "OriginAirportID_targetmean", "OriginCityName_frequency", "DistanceMiles", "FlightDelayType_Late Aircraft Delay" ], "tree_structure":[ { "decision_type":"lt", "threshold":9069.33437193022, "split_feature":0, "split_gain":4112.094574306927, "node_index":0, "default_left":true, "left_child":1, "right_child":2 }, ... { "node_index":9, "leaf_value":-27.68987349695448 }, ... ], "target_type":"regression" } }
The following example shows an ensemble
model object:
"ensemble":{ "feature_names":[ ... ], "trained_models":[ { "tree":{ "feature_names":[], "tree_structure":[ { "decision_type":"lte", "node_index":0, "leaf_value":47.64069875778043, "default_left":false } ], "target_type":"regression" } }, ... ], "aggregate_output":{ "weighted_sum":{ "weights":[ ... ] } }, "target_type":"regression" }
Aggregated output example
editExample of a logistic_regression
object:
"aggregate_output" : { "logistic_regression" : { "weights" : [2.0, 1.0, .5, -1.0, 5.0, 1.0, 1.0] } }
Example of a weighted_sum
object:
"aggregate_output" : { "weighted_sum" : { "weights" : [1.0, -1.0, .5, 1.0, 5.0] } }
Example of a weighted_mode
object:
"aggregate_output" : { "weighted_mode" : { "weights" : [1.0, 1.0, 1.0, 1.0, 1.0] } }
Inference JSON schema
editFor the full JSON schema of model inference, click here.