Estimate job model memory usage | Elasticsearch Serverless API documentation

Estimate job model memory usage Generally available

POST /_ml/anomaly_detectors/_estimate_model_memory

Make an estimation of the memory usage for an anomaly detection job model. The estimate is based on analysis configuration details for the job and cardinality estimates for the fields it references.

Required authorization

Cluster privileges: manage_ml

application/json

Body Required

analysis_config object

For a list of the properties that you can specify in the analysis_config component of the body of this API.
Hide analysis_config attributes Show analysis_config attributes object
- bucket_span string
  
  The size of the interval that the analysis is aggregated into, typically between 5m and 1h. This value should be either a whole number of days or equate to a whole number of buckets in one day. If the anomaly detection job uses a datafeed with aggregations, this value must also be divisible by the interval of the date histogram aggregation.
  
  External documentation
- categorization_analyzer string | object
  
  If categorization_field_name is specified, you can also define the analyzer that is used to interpret the categorization field. This property cannot be used at the same time as categorization_filters. The categorization analyzer specifies how the categorization_field is interpreted by the categorization process. The categorization_analyzer field can be specified either as a string or as an object. If it is a string, it must refer to a built-in analyzer or one added by another plugin.
  
  One of:
  string-1 string CategorizationAnalyzerDefinition object
  
  If categorization_field_name is specified, you can also define the analyzer that is used to interpret the categorization field. This property cannot be used at the same time as categorization_filters. The categorization analyzer specifies how the categorization_field is interpreted by the categorization process. The categorization_analyzer field can be specified either as a string or as an object. If it is a string, it must refer to a built-in analyzer or one added by another plugin.
  
  If categorization_field_name is specified, you can also define the analyzer that is used to interpret the categorization field. This property cannot be used at the same time as categorization_filters. The categorization analyzer specifies how the categorization_field is interpreted by the categorization process. The categorization_analyzer field can be specified either as a string or as an object. If it is a string, it must refer to a built-in analyzer or one added by another plugin.
  
  Hide attributes Show attributes
  
  char_filter array
  
  One or more character filters. In addition to the built-in character filters, other plugins can provide more character filters. If this property is not specified, no character filters are applied prior to categorization. If you are customizing some other aspect of the analyzer and you need to achieve the equivalent of categorization_filters (which are not permitted when some other aspect of the analyzer is customized), add them here as pattern replace character filters.
  
  External documentation
  
  filter array
  
  One or more token filters. In addition to the built-in token filters, other plugins can provide more token filters. If this property is not specified, no token filters are applied prior to categorization.
  
  External documentation
  
  tokenizer object | string
  
  The name or definition of the tokenizer to use after character filters are applied. This property is compulsory if categorization_analyzer is specified as an object. Machine learning provides a tokenizer called ml_standard that tokenizes in a way that has been determined to produce good categorization results on a variety of log file formats for logs in English. If you want to use that tokenizer but change the character or token filters, specify "tokenizer": "ml_standard" in your categorization_analyzer. Additionally, the ml_classic tokenizer is available, which tokenizes in the same way as the non-customizable tokenizer in old versions of the product (before 6.2). ml_classic was the default categorization tokenizer in versions 6.2 to 7.13, so if you need categorization identical to the default for jobs created in these versions, specify "tokenizer": "ml_classic" in your categorization_analyzer.
  
  Tokenizer reference
  
  One of:
  object-1 object string-2 string
- categorization_field_name string
  
  If this property is specified, the values of the specified field will be categorized. The resulting categories must be used in a detector by setting by_field_name, over_field_name, or partition_field_name to the keyword mlcategory.
- categorization_filters array[string]
  
  If categorization_field_name is specified, you can also define optional filters. This property expects an array of regular expressions. The expressions are used to filter out matching sequences from the categorization field values. You can use this functionality to fine tune the categorization by excluding sequences from consideration when categories are defined. For example, you can exclude SQL statements that appear in your log files. This property cannot be used at the same time as categorization_analyzer. If you only want to define simple regular expression filters that are applied prior to tokenization, setting this property is the easiest method. If you also want to customize the tokenizer or post-tokenization filtering, use the categorization_analyzer property instead and include the filters as pattern_replace character filters. The effect is exactly the same.
- detectors array[object] Required
  
  Detector configuration objects specify which data fields a job analyzes. They also specify which analytical functions are used. You can specify multiple detectors for a job. If the detectors array does not contain at least one detector, no analysis can occur and an error is returned.
  Hide detectors attributes Show detectors attributes object
  
  by_field_name string
  
  The field used to split the data. In particular, this property is used for analyzing the splits with respect to their own history. It is used for finding unusual values in the context of the split.
  
  custom_rules array[object]
  
  Custom rules enable you to customize the way detectors operate. For example, a rule may dictate conditions under which results should be skipped. Kibana refers to custom rules as job rules.
  
  Hide custom_rules attributes Show custom_rules attributes object
  
  actions array[string]
  
  The set of actions to be triggered when the rule applies. If more than one action is specified the effects of all actions are combined.
  
  Supported values include:
  
  skip_result: The result will not be created. Unless you also specify skip_model_update, the model will be updated as usual with the corresponding series value.
  
  skip_model_update: The value for that series will not be used to update the model. Unless you also specify skip_result, the results will be created as usual. This action is suitable when certain values are expected to be consistently anomalous and they affect the model in a way that negatively impacts the rest of the results.
  
  Values are skip_result or skip_model_update. Default value is ["skip_result"].
  
  conditions array[object]
  
  An array of numeric conditions when the rule applies. A rule must either have a non-empty scope or at least one condition. Multiple conditions are combined together with a logical AND.
  
  scope object
  
  A scope of series where the rule applies. A rule must either have a non-empty scope or at least one condition. By default, the scope includes all series. Scoping is allowed for any of the fields that are also specified in by_field_name, over_field_name, or partition_field_name.
  
  detector_description string
  
  A description of the detector.
  
  detector_index number
  
  A unique identifier for the detector. This identifier is based on the order of the detectors in the analysis_config, starting at zero. If you specify a value for this property, it is ignored.
  
  exclude_frequent string
  
  If set, frequent entities are excluded from influencing the anomaly results. Entities can be considered frequent over time or frequent in a population. If you are working with both over and by fields, you can set exclude_frequent to all for both fields, or to by or over for those specific fields.
  
  Values are all, none, by, or over.
  
  field_name string
  
  The field that the detector uses in the function. If you use an event rate function such as count or rare, do not specify this field. The field_name cannot contain double quotes or backslashes.
  
  function string
  
  The analysis function that is used. For example, count, rare, mean, min, max, or sum.
  
  over_field_name string
  
  The field used to split the data. In particular, this property is used for analyzing the splits with respect to the history of all splits. It is used for finding unusual values in the population of all splits.
  
  partition_field_name string
  
  The field used to segment the analysis. When you use this property, you have completely independent baselines for each value of this field.
  
  use_null boolean
  
  Defines whether a new series is used as the null series when there is no value for the by or partition fields.
  
  Default value is false.
- influencers array[string]
  
  A comma separated list of influencer field names. Typically these can be the by, over, or partition fields that are used in the detector configuration. You might also want to use a field name that is not specifically named in a detector, but is available as part of the input data. When you use multiple detectors, the use of influencers is recommended as it aggregates results for each influencer entity.
- latency string
  
  The size of the window in which to expect data that is out of time order. If you specify a non-zero value, it must be greater than or equal to one second. NOTE: Latency is applicable only when you send data by using the post data API.
  
  External documentation
- model_prune_window string
  
  Advanced configuration option. Affects the pruning of models that have not been updated for the given time duration. The value must be set to a multiple of the bucket_span. If set too low, important information may be removed from the model. For jobs created in 8.1 and later, the default value is the greater of 30d or 20 times bucket_span.
  
  External documentation
- multivariate_by_fields boolean
  
  This functionality is reserved for internal use. It is not supported for use in customer environments and is not subject to the support SLA of official GA features. If set to true, the analysis will automatically find correlations between metrics for a given by field value and report anomalies when those correlations cease to hold. For example, suppose CPU and memory usage on host A is usually highly correlated with the same metrics on host B. Perhaps this correlation occurs because they are running a load-balanced application. If you enable this property, anomalies will be reported when, for example, CPU usage on host A is high and the value of CPU usage on host B is low. That is to say, you’ll see an anomaly when the CPU of host A is unusual given the CPU of host B. To use the multivariate_by_fields property, you must also specify by_field_name in your detector.
- per_partition_categorization object
  
  Settings related to how categorization interacts with partition fields.
  Hide per_partition_categorization attributes Show per_partition_categorization attributes object
  
  enabled boolean
  
  To enable this setting, you must also set the partition_field_name property to the same value in every detector that uses the keyword mlcategory. Otherwise, job creation fails.
  
  stop_on_warn boolean
  
  This setting can be set to true only if per-partition categorization is enabled. If true, both categorization and subsequent anomaly detection stops for partitions where the categorization status changes to warn. This setting makes it viable to have a job where it is expected that categorization works well for some partitions but not others; you do not pay the cost of bad categorization forever in the partitions where it works badly.
- summary_count_field_name string
  
  If this property is specified, the data that is fed to the job is expected to be pre-summarized. This property value is the name of the field that contains the count of raw data points that have been summarized. The same summary_count_field_name applies to all detectors in the job. NOTE: The summary_count_field_name property cannot be used with the metric function.
max_bucket_cardinality object

Estimates of the highest cardinality in a single bucket that is observed for influencer fields over the time period that the job analyzes data. To produce a good answer, values must be provided for all influencer fields. Providing values for fields that are not listed as influencers has no effect on the estimation.
Hide max_bucket_cardinality attribute Show max_bucket_cardinality attribute object
- * number Additional properties
overall_cardinality object

Estimates of the cardinality that is observed for fields over the whole time period that the job analyzes data. To produce a good answer, values must be provided for fields referenced in the by_field_name, over_field_name and partition_field_name of any detectors. Providing values for other fields has no effect on the estimation. It can be omitted from the request if no detectors have a by_field_name, over_field_name or partition_field_name.
Hide overall_cardinality attribute Show overall_cardinality attribute object
- * number Additional properties

Responses

200 application/json
Hide response attribute Show response attribute object
- model_memory_estimate string Required

POST /_ml/anomaly_detectors/_estimate_model_memory

POST _ml/anomaly_detectors/_estimate_model_memory
{
  "analysis_config": {
    "bucket_span": "5m",
    "detectors": [
      {
        "function": "sum",
        "field_name": "bytes",
        "by_field_name": "status",
        "partition_field_name": "app"
      }
    ],
    "influencers": [
      "source_ip",
      "dest_ip"
    ]
  },
  "overall_cardinality": {
    "status": 10,
    "app": 50
  },
  "max_bucket_cardinality": {
    "source_ip": 300,
    "dest_ip": 30
  }
}

resp = client.ml.estimate_model_memory(
    analysis_config={
        "bucket_span": "5m",
        "detectors": [
            {
                "function": "sum",
                "field_name": "bytes",
                "by_field_name": "status",
                "partition_field_name": "app"
            }
        ],
        "influencers": [
            "source_ip",
            "dest_ip"
        ]
    },
    overall_cardinality={
        "status": 10,
        "app": 50
    },
    max_bucket_cardinality={
        "source_ip": 300,
        "dest_ip": 30
    },
)

const response = await client.ml.estimateModelMemory({
  analysis_config: {
    bucket_span: "5m",
    detectors: [
      {
        function: "sum",
        field_name: "bytes",
        by_field_name: "status",
        partition_field_name: "app",
      },
    ],
    influencers: ["source_ip", "dest_ip"],
  },
  overall_cardinality: {
    status: 10,
    app: 50,
  },
  max_bucket_cardinality: {
    source_ip: 300,
    dest_ip: 30,
  },
});

response = client.ml.estimate_model_memory(
  body: {
    "analysis_config": {
      "bucket_span": "5m",
      "detectors": [
        {
          "function": "sum",
          "field_name": "bytes",
          "by_field_name": "status",
          "partition_field_name": "app"
        }
      ],
      "influencers": [
        "source_ip",
        "dest_ip"
      ]
    },
    "overall_cardinality": {
      "status": 10,
      "app": 50
    },
    "max_bucket_cardinality": {
      "source_ip": 300,
      "dest_ip": 30
    }
  }
)

$resp = $client->ml()->estimateModelMemory([
    "body" => [
        "analysis_config" => [
            "bucket_span" => "5m",
            "detectors" => array(
                [
                    "function" => "sum",
                    "field_name" => "bytes",
                    "by_field_name" => "status",
                    "partition_field_name" => "app",
                ],
            ),
            "influencers" => array(
                "source_ip",
                "dest_ip",
            ),
        ],
        "overall_cardinality" => [
            "status" => 10,
            "app" => 50,
        ],
        "max_bucket_cardinality" => [
            "source_ip" => 300,
            "dest_ip" => 30,
        ],
    ],
]);

curl -X POST -H "Authorization: ApiKey $ELASTIC_API_KEY" -H "Content-Type: application/json" -d '{"analysis_config":{"bucket_span":"5m","detectors":[{"function":"sum","field_name":"bytes","by_field_name":"status","partition_field_name":"app"}],"influencers":["source_ip","dest_ip"]},"overall_cardinality":{"status":10,"app":50},"max_bucket_cardinality":{"source_ip":300,"dest_ip":30}}' "$ELASTICSEARCH_URL/_ml/anomaly_detectors/_estimate_model_memory"

client.ml().estimateModelMemory(e -> e
    .analysisConfig(a -> a
        .bucketSpan(b -> b
            .time("5m")
        )
        .detectors(d -> d
            .byFieldName("status")
            .fieldName("bytes")
            .function("sum")
            .partitionFieldName("app")
        )
        .influencers(List.of("source_ip","dest_ip"))
    )
    .maxBucketCardinality(Map.of("dest_ip", 30L,"source_ip", 300L))
    .overallCardinality(Map.of("app", 50L,"status", 10L))
);

Request example

Run `POST _ml/anomaly_detectors/_estimate_model_memory` to estimate the model memory limit based on the analysis configuration details provided in the request body.

{
  "analysis_config": {
    "bucket_span": "5m",
    "detectors": [
      {
        "function": "sum",
        "field_name": "bytes",
        "by_field_name": "status",
        "partition_field_name": "app"
      }
    ],
    "influencers": [
      "source_ip",
      "dest_ip"
    ]
  },
  "overall_cardinality": {
    "status": 10,
    "app": 50
  },
  "max_bucket_cardinality": {
    "source_ip": 300,
    "dest_ip": 30
  }
}

Response examples (200)

A successful response from `POST _ml/anomaly_detectors/_estimate_model_memory`.

{
  "model_memory_estimate": "21mb"
}

Estimate job model memory usage Generally available

Required authorization

Body Required

categorization_analyzer string | object

Responses