Preview features used by data frame analytics | Elasticsearch API documentation (v8)

Preview features used by data frame analytics Generally available; Added in 7.13.0

POST /_ml/data_frame/analytics/{id}/_preview

Api key auth Basic auth Bearer auth

All methods and paths for this operation:

GET /_ml/data_frame/analytics/_preview

POST /_ml/data_frame/analytics/_preview

GET /_ml/data_frame/analytics/{id}/_preview

POST /_ml/data_frame/analytics/{id}/_preview

Preview the extracted features used by a data frame analytics config.

Required authorization

Cluster privileges: monitor_ml

Path parameters

id string Required

Identifier for the data frame analytics job.

application/json

Body

config object

A data frame analytics config as described in create data frame analytics jobs. Note that id and dest don’t need to be provided in the context of this API.
Hide config attributes Show config attributes object
- source object Required
  Hide source attributes Show source attributes object
  
  index string | array[string] Required
  
  Index or indices on which to perform the analysis. It can be a single index or index pattern as well as an array of indices or patterns. NOTE: If your source indices contain documents with the same IDs, only the document that is indexed last appears in the destination index.
  
  runtime_mappings object
  
  Definitions of runtime fields that will become part of the mapping of the destination index.
  
  Hide runtime_mappings attribute Show runtime_mappings attribute object
  
  * object Additional properties
  
  _source object
  
  Specify includes and/or `excludes patterns to select which fields will be present in the destination. Fields that are excluded cannot be included in the analysis.
  
  Hide _source attributes Show _source attributes object
  
  includes array[string] Required
  
  An array of strings that defines the fields that will be excluded from the analysis. You do not need to add fields with unsupported data types to excludes, these fields are excluded from the analysis automatically.
  
  excludes array[string] Required
  
  An array of strings that defines the fields that will be included in the analysis.
  
  query object
  
  The Elasticsearch query domain-specific language (DSL). This value corresponds to the query object in an Elasticsearch search POST body. All the options that are supported by Elasticsearch can be used, as this object is passed verbatim to Elasticsearch. By default, this property has the following value: {"match_all": {}}.
  
  Query DSL
- analysis object Required
  Hide analysis attributes Show analysis attributes object
  
  classification object
  
  outlier_detection object
  
  The configuration information necessary to perform outlier detection. NOTE: Advanced parameters are for fine-tuning classification analysis. They are set automatically by hyperparameter optimization to give the minimum validation error. It is highly recommended to use the default values unless you fully understand the function of these parameters.
  
  Hide outlier_detection attributes Show outlier_detection attributes object
  
  compute_feature_influence boolean
  
  Specifies whether the feature influence calculation is enabled.
  
  Default value is true.
  
  feature_influence_threshold number
  
  The minimum outlier score that a document needs to have in order to calculate its feature influence score. Value range: 0-1.
  
  Default value is 0.1.
  
  method string
  
  The method that outlier detection uses. Available methods are lof, ldof, distance_kth_nn, distance_knn, and ensemble. The default value is ensemble, which means that outlier detection uses an ensemble of different methods and normalises and combines their individual outlier scores to obtain the overall outlier score.
  
  Default value is ensemble.
  
  n_neighbors number
  
  Defines the value for how many nearest neighbors each method of outlier detection uses to calculate its outlier score. When the value is not set, different values are used for different ensemble members. This default behavior helps improve the diversity in the ensemble; only override it if you are confident that the value you choose is appropriate for the data set.
  
  outlier_fraction number
  
  The proportion of the data set that is assumed to be outlying prior to outlier detection. For example, 0.05 means it is assumed that 5% of values are real outliers and 95% are inliers.
  
  standardization_enabled boolean
  
  If true, the following operation is performed on the columns before computing outlier scores: (x_i - mean(x_i)) / sd(x_i).
  
  Default value is true.
  
  regression object
- model_memory_limit string
- max_num_threads number
- analyzed_fields object
  Hide analyzed_fields attributes Show analyzed_fields attributes object
  
  includes array[string] Required
  
  An array of strings that defines the fields that will be excluded from the analysis. You do not need to add fields with unsupported data types to excludes, these fields are excluded from the analysis automatically.
  
  excludes array[string] Required
  
  An array of strings that defines the fields that will be included in the analysis.

Responses

200 application/json
Hide response attribute Show response attribute object
- feature_values array[object] Required
  
  An array of objects that contain feature name and value pairs. The features have been processed and indicate what will be sent to the model for training.
  
  Hide feature_values attribute Show feature_values attribute object
  
  * string Additional properties

POST /_ml/data_frame/analytics/{id}/_preview

POST _ml/data_frame/analytics/_preview
{
  "config": {
    "source": {
      "index": "houses_sold_last_10_yrs"
    },
    "analysis": {
      "regression": {
        "dependent_variable": "price"
      }
    }
  }
}

resp = client.ml.preview_data_frame_analytics(
    config={
        "source": {
            "index": "houses_sold_last_10_yrs"
        },
        "analysis": {
            "regression": {
                "dependent_variable": "price"
            }
        }
    },
)

const response = await client.ml.previewDataFrameAnalytics({
  config: {
    source: {
      index: "houses_sold_last_10_yrs",
    },
    analysis: {
      regression: {
        dependent_variable: "price",
      },
    },
  },
});

response = client.ml.preview_data_frame_analytics(
  body: {
    "config": {
      "source": {
        "index": "houses_sold_last_10_yrs"
      },
      "analysis": {
        "regression": {
          "dependent_variable": "price"
        }
      }
    }
  }
)

$resp = $client->ml()->previewDataFrameAnalytics([
    "body" => [
        "config" => [
            "source" => [
                "index" => "houses_sold_last_10_yrs",
            ],
            "analysis" => [
                "regression" => [
                    "dependent_variable" => "price",
                ],
            ],
        ],
    ],
]);

curl -X POST -H "Authorization: ApiKey $ELASTIC_API_KEY" -H "Content-Type: application/json" -d '{"config":{"source":{"index":"houses_sold_last_10_yrs"},"analysis":{"regression":{"dependent_variable":"price"}}}}' "$ELASTICSEARCH_URL/_ml/data_frame/analytics/_preview"

client.ml().previewDataFrameAnalytics(p -> p
    .config(c -> c
        .source(s -> s
            .index("houses_sold_last_10_yrs")
        )
        .analysis(a -> a
            .regression(r -> r
                .dependentVariable("price")
            )
        )
    )
);

Request example

An example body for a `POST _ml/data_frame/analytics/_preview` request.

{
  "config": {
    "source": {
      "index": "houses_sold_last_10_yrs"
    },
    "analysis": {
      "regression": {
        "dependent_variable": "price"
      }
    }
  }
}