Get data frame analytics job configuration info
You can get information for multiple data frame analytics jobs in a single API request by using a comma-separated list of data frame analytics jobs or a wildcard expression.
Path parameters
-
Identifier for the data frame analytics job. If you do not specify this option, the API returns information for the first hundred data frame analytics jobs.
Query parameters
-
allow_no_match boolean
Specifies what to do when the request:
- Contains wildcard expressions and there are no data frame analytics jobs that match.
- Contains the
_all
string or no identifiers and there are no matches. - Contains wildcard expressions and there are only partial matches.
The default value returns an empty data_frame_analytics array when there are no matches and the subset of results when there are partial matches. If this parameter is
false
, the request returns a 404 status code when there are no matches or only partial matches. -
from number
Skips the specified number of data frame analytics jobs.
-
size number
Specifies the maximum number of data frame analytics jobs to obtain.
-
exclude_generated boolean
Indicates if certain fields should be removed from the configuration on retrieval. This allows the configuration to be in an acceptable format to be retrieved and then added to another cluster.
Responses
-
200 application/json
Hide response attributes Show response attributes object
-
An array of data frame analytics job resources, which are sorted by the id value in ascending order.
Hide data_frame_analytics attributes Show data_frame_analytics attributes object
-
allow_lazy_start boolean
-
Hide analysis attributes Show analysis attributes object
-
classification object
Hide classification attributes Show classification attributes object
-
alpha number
Advanced configuration option. Machine learning uses loss guided tree growing, which means that the decision trees grow where the regularized loss decreases most quickly. This parameter affects loss calculations by acting as a multiplier of the tree depth. Higher alpha values result in shallower trees and faster training times. By default, this value is calculated during hyperparameter optimization. It must be greater than or equal to zero.
-
Defines which field of the document is to be predicted. It must match one of the fields in the index being used to train. If this field is missing from a document, then that document will not be used for training, but a prediction with the trained model will be generated for it. It is also known as continuous target variable. For classification analysis, the data type of the field must be numeric (
integer
,short
,long
,byte
), categorical (ip
orkeyword
), orboolean
. There must be no more than 30 different values in this field. For regression analysis, the data type of the field must be numeric. -
downsample_factor number
Advanced configuration option. Controls the fraction of data that is used to compute the derivatives of the loss function for tree training. A small value results in the use of a small fraction of the data. If this value is set to be less than 1, accuracy typically improves. However, too small a value may result in poor convergence for the ensemble and so require more trees. By default, this value is calculated during hyperparameter optimization. It must be greater than zero and less than or equal to 1.
-
early_stopping_enabled boolean
Advanced configuration option. Specifies whether the training process should finish if it is not finding any better performing models. If disabled, the training process can take significantly longer and the chance of finding a better performing model is unremarkable.
-
eta number
Advanced configuration option. The shrinkage applied to the weights. Smaller values result in larger forests which have a better generalization error. However, larger forests cause slower training. By default, this value is calculated during hyperparameter optimization. It must be a value between 0.001 and 1.
-
eta_growth_rate_per_tree number
Advanced configuration option. Specifies the rate at which
eta
increases for each new tree that is added to the forest. For example, a rate of 1.05 increaseseta
by 5% for each extra tree. By default, this value is calculated during hyperparameter optimization. It must be between 0.5 and 2. -
feature_bag_fraction number
Advanced configuration option. Defines the fraction of features that will be used when selecting a random bag for each candidate split. By default, this value is calculated during hyperparameter optimization.
-
feature_processors array[object]
Advanced configuration option. A collection of feature preprocessors that modify one or more included fields. The analysis uses the resulting one or more features instead of the original document field. However, these features are ephemeral; they are not stored in the destination index. Multiple
feature_processors
entries can refer to the same document fields. Automatic categorical feature encoding still occurs for the fields that are unprocessed by a custom processor or that have categorical values. Use this property only if you want to override the automatic feature encoding of the specified fields. -
gamma number
Advanced configuration option. Regularization parameter to prevent overfitting on the training data set. Multiplies a linear penalty associated with the size of individual trees in the forest. A high gamma value causes training to prefer small trees. A small gamma value results in larger individual trees and slower training. By default, this value is calculated during hyperparameter optimization. It must be a nonnegative value.
-
lambda number
Advanced configuration option. Regularization parameter to prevent overfitting on the training data set. Multiplies an L2 regularization term which applies to leaf weights of the individual trees in the forest. A high lambda value causes training to favor small leaf weights. This behavior makes the prediction function smoother at the expense of potentially not being able to capture relevant relationships between the features and the dependent variable. A small lambda value results in large individual trees and slower training. By default, this value is calculated during hyperparameter optimization. It must be a nonnegative value.
-
Advanced configuration option. A multiplier responsible for determining the maximum number of hyperparameter optimization steps in the Bayesian optimization procedure. The maximum number of steps is determined based on the number of undefined hyperparameters times the maximum optimization rounds per hyperparameter. By default, this value is calculated during hyperparameter optimization.
-
max_trees number
Advanced configuration option. Defines the maximum number of decision trees in the forest. The maximum value is 2000. By default, this value is calculated during hyperparameter optimization.
-
Advanced configuration option. Specifies the maximum number of feature importance values per document to return. By default, no feature importance calculation occurs.
-
prediction_field_name string
Path to field or array of paths. Some API's support wildcards in the path to select multiple fields.
-
randomize_seed number
Defines the seed for the random generator that is used to pick training data. By default, it is randomly generated. Set it to a specific value to use the same training data each time you start a job (assuming other related parameters such as
source
andanalyzed_fields
are the same). -
soft_tree_depth_limit number
Advanced configuration option. Machine learning uses loss guided tree growing, which means that the decision trees grow where the regularized loss decreases most quickly. This soft limit combines with the
soft_tree_depth_tolerance
to penalize trees that exceed the specified depth; the regularized loss increases quickly beyond this depth. By default, this value is calculated during hyperparameter optimization. It must be greater than or equal to 0. -
soft_tree_depth_tolerance number
Advanced configuration option. This option controls how quickly the regularized loss increases when the tree depth exceeds
soft_tree_depth_limit
. By default, this value is calculated during hyperparameter optimization. It must be greater than or equal to 0.01. -
class_assignment_objective string
-
num_top_classes number
Defines the number of categories for which the predicted probabilities are reported. It must be non-negative or -1. If it is -1 or greater than the total number of categories, probabilities are reported for all categories; if you have a large number of categories, there could be a significant effect on the size of your destination index. NOTE: To use the AUC ROC evaluation method,
num_top_classes
must be set to -1 or a value greater than or equal to the total number of categories.
-
-
outlier_detection object
Hide outlier_detection attributes Show outlier_detection attributes object
-
compute_feature_influence boolean
Specifies whether the feature influence calculation is enabled.
-
feature_influence_threshold number
The minimum outlier score that a document needs to have in order to calculate its feature influence score. Value range: 0-1.
-
method string
The method that outlier detection uses. Available methods are
lof
,ldof
,distance_kth_nn
,distance_knn
, andensemble
. The default value is ensemble, which means that outlier detection uses an ensemble of different methods and normalises and combines their individual outlier scores to obtain the overall outlier score. -
n_neighbors number
Defines the value for how many nearest neighbors each method of outlier detection uses to calculate its outlier score. When the value is not set, different values are used for different ensemble members. This default behavior helps improve the diversity in the ensemble; only override it if you are confident that the value you choose is appropriate for the data set.
-
outlier_fraction number
The proportion of the data set that is assumed to be outlying prior to outlier detection. For example, 0.05 means it is assumed that 5% of values are real outliers and 95% are inliers.
-
standardization_enabled boolean
If true, the following operation is performed on the columns before computing outlier scores:
(x_i - mean(x_i)) / sd(x_i)
.
-
-
regression object
Hide regression attributes Show regression attributes object
-
alpha number
Advanced configuration option. Machine learning uses loss guided tree growing, which means that the decision trees grow where the regularized loss decreases most quickly. This parameter affects loss calculations by acting as a multiplier of the tree depth. Higher alpha values result in shallower trees and faster training times. By default, this value is calculated during hyperparameter optimization. It must be greater than or equal to zero.
-
Defines which field of the document is to be predicted. It must match one of the fields in the index being used to train. If this field is missing from a document, then that document will not be used for training, but a prediction with the trained model will be generated for it. It is also known as continuous target variable. For classification analysis, the data type of the field must be numeric (
integer
,short
,long
,byte
), categorical (ip
orkeyword
), orboolean
. There must be no more than 30 different values in this field. For regression analysis, the data type of the field must be numeric. -
downsample_factor number
Advanced configuration option. Controls the fraction of data that is used to compute the derivatives of the loss function for tree training. A small value results in the use of a small fraction of the data. If this value is set to be less than 1, accuracy typically improves. However, too small a value may result in poor convergence for the ensemble and so require more trees. By default, this value is calculated during hyperparameter optimization. It must be greater than zero and less than or equal to 1.
-
early_stopping_enabled boolean
Advanced configuration option. Specifies whether the training process should finish if it is not finding any better performing models. If disabled, the training process can take significantly longer and the chance of finding a better performing model is unremarkable.
-
eta number
Advanced configuration option. The shrinkage applied to the weights. Smaller values result in larger forests which have a better generalization error. However, larger forests cause slower training. By default, this value is calculated during hyperparameter optimization. It must be a value between 0.001 and 1.
-
eta_growth_rate_per_tree number
Advanced configuration option. Specifies the rate at which
eta
increases for each new tree that is added to the forest. For example, a rate of 1.05 increaseseta
by 5% for each extra tree. By default, this value is calculated during hyperparameter optimization. It must be between 0.5 and 2. -
feature_bag_fraction number
Advanced configuration option. Defines the fraction of features that will be used when selecting a random bag for each candidate split. By default, this value is calculated during hyperparameter optimization.
-
feature_processors array[object]
Advanced configuration option. A collection of feature preprocessors that modify one or more included fields. The analysis uses the resulting one or more features instead of the original document field. However, these features are ephemeral; they are not stored in the destination index. Multiple
feature_processors
entries can refer to the same document fields. Automatic categorical feature encoding still occurs for the fields that are unprocessed by a custom processor or that have categorical values. Use this property only if you want to override the automatic feature encoding of the specified fields. -
gamma number
Advanced configuration option. Regularization parameter to prevent overfitting on the training data set. Multiplies a linear penalty associated with the size of individual trees in the forest. A high gamma value causes training to prefer small trees. A small gamma value results in larger individual trees and slower training. By default, this value is calculated during hyperparameter optimization. It must be a nonnegative value.
-
lambda number
Advanced configuration option. Regularization parameter to prevent overfitting on the training data set. Multiplies an L2 regularization term which applies to leaf weights of the individual trees in the forest. A high lambda value causes training to favor small leaf weights. This behavior makes the prediction function smoother at the expense of potentially not being able to capture relevant relationships between the features and the dependent variable. A small lambda value results in large individual trees and slower training. By default, this value is calculated during hyperparameter optimization. It must be a nonnegative value.
-
Advanced configuration option. A multiplier responsible for determining the maximum number of hyperparameter optimization steps in the Bayesian optimization procedure. The maximum number of steps is determined based on the number of undefined hyperparameters times the maximum optimization rounds per hyperparameter. By default, this value is calculated during hyperparameter optimization.
-
max_trees number
Advanced configuration option. Defines the maximum number of decision trees in the forest. The maximum value is 2000. By default, this value is calculated during hyperparameter optimization.
-
Advanced configuration option. Specifies the maximum number of feature importance values per document to return. By default, no feature importance calculation occurs.
-
prediction_field_name string
Path to field or array of paths. Some API's support wildcards in the path to select multiple fields.
-
randomize_seed number
Defines the seed for the random generator that is used to pick training data. By default, it is randomly generated. Set it to a specific value to use the same training data each time you start a job (assuming other related parameters such as
source
andanalyzed_fields
are the same). -
soft_tree_depth_limit number
Advanced configuration option. Machine learning uses loss guided tree growing, which means that the decision trees grow where the regularized loss decreases most quickly. This soft limit combines with the
soft_tree_depth_tolerance
to penalize trees that exceed the specified depth; the regularized loss increases quickly beyond this depth. By default, this value is calculated during hyperparameter optimization. It must be greater than or equal to 0. -
soft_tree_depth_tolerance number
Advanced configuration option. This option controls how quickly the regularized loss increases when the tree depth exceeds
soft_tree_depth_limit
. By default, this value is calculated during hyperparameter optimization. It must be greater than or equal to 0.01. -
loss_function string
The loss function used during regression. Available options are
mse
(mean squared error),msle
(mean squared logarithmic error),huber
(Pseudo-Huber loss). -
loss_function_parameter number
A positive number that is used as a parameter to the
loss_function
.
-
-
-
analyzed_fields object
Hide analyzed_fields attributes Show analyzed_fields attributes object
-
An array of strings that defines the fields that will be excluded from the analysis. You do not need to add fields with unsupported data types to excludes, these fields are excluded from the analysis automatically.
-
An array of strings that defines the fields that will be included in the analysis.
-
-
authorization object
Hide authorization attributes Show authorization attributes object
-
api_key object
-
roles array[string]
If a user ID was used for the most recent update to the job, its roles at the time of the update are listed in the response.
-
service_account string
If a service account was used for the most recent update to the job, the account name is listed in the response.
-
-
create_time number
Time unit for milliseconds
-
description string
-
Hide dest attributes Show dest attributes object
-
results_field string
Path to field or array of paths. Some API's support wildcards in the path to select multiple fields.
-
max_num_threads number
-
model_memory_limit string
-
Hide source attributes Show source attributes object
-
query object
Hide query attributes Show query attributes object
-
bool object
-
boosting object
-
combined_fields object
-
constant_score object
-
dis_max object
-
exists object
-
function_score object
-
fuzzy object
Returns documents that contain terms similar to the search term, as measured by a Levenshtein edit distance.
-
geo_bounding_box object
-
geo_distance object
-
geo_polygon object
-
geo_shape object
-
has_child object
-
has_parent object
-
ids object
-
intervals object
Returns documents based on the order and proximity of matching terms.
-
knn object
-
match object
Returns documents that match a provided text, number, date or boolean value. The provided text is analyzed before matching.
-
match_all object
-
match_bool_prefix object
Analyzes its input and constructs a
bool
query from the terms. Each term except the last is used in aterm
query. The last term is used in a prefix query. -
match_none object
-
match_phrase object
Analyzes the text and creates a phrase query out of the analyzed text.
-
match_phrase_prefix object
Returns documents that contain the words of a provided text, in the same order as provided. The last term of the provided text is treated as a prefix, matching any words that begin with that term.
-
more_like_this object
-
multi_match object
-
nested object
-
parent_id object
-
percolate object
-
prefix object
Returns documents that contain a specific prefix in a provided field.
-
query_string object
-
range object
Returns documents that contain terms within a provided range.
-
rank_feature object
-
regexp object
Returns documents that contain terms matching a regular expression.
-
rule object
-
script object
-
script_score object
-
semantic object
-
shape object
-
simple_query_string object
-
span_containing object
-
span_field_masking object
-
span_first object
-
span_multi object
-
span_near object
-
span_not object
-
span_or object
-
span_term object
Matches spans containing a term.
-
span_within object
-
term object
Returns documents that contain an exact term in a provided field. To return a document, the query term must exactly match the queried field's value, including whitespace and capitalization.
-
terms object
-
terms_set object
Returns documents that contain a minimum number of exact terms in a provided field. To return a document, a required number of terms must exactly match the field values, including whitespace and capitalization.
-
Uses a natural language processing model to convert the query text into a list of token-weight pairs which are then used in a query against a sparse vector or rank features field.
-
Supports returning text_expansion query results by sending in precomputed tokens with the query.
-
wildcard object
Returns documents that contain terms matching a wildcard pattern.
-
wrapper object
-
type object
-
-
runtime_mappings object
Hide runtime_mappings attributes Show runtime_mappings attributes object
-
fetch_fields array[object]
For type
lookup
-
format string
A custom format for
date
type runtime fields. -
input_field string
Path to field or array of paths. Some API's support wildcards in the path to select multiple fields.
-
target_field string
Path to field or array of paths. Some API's support wildcards in the path to select multiple fields.
-
target_index string
-
script object
-
Values are
boolean
,composite
,date
,double
,geo_point
,ip
,keyword
,long
, orlookup
.
-
_source object
Hide _source attributes Show _source attributes object
-
An array of strings that defines the fields that will be excluded from the analysis. You do not need to add fields with unsupported data types to excludes, these fields are excluded from the analysis automatically.
-
An array of strings that defines the fields that will be included in the analysis.
-
-
version string
-
curl \
-X GET http://api.example.com/_ml/data_frame/analytics/{id}