Machine Learning Limitations

edit

The following limitations and known problems apply to the 5.4.3 release of X-Pack:

Categorization uses English tokenization rules and dictionary words

edit

Categorization identifies static parts of unstructured logs and groups similar messages together. This is currently supported only for English language log messages.

Pop-ups must be enabled in browsers

edit

The X-Pack machine learning features in Kibana use pop-ups. You must configure your web browser so that it does not block pop-up windows or create an exception for your Kibana URL.

Jobs must be re-created at GA

edit

The models that you create in the X-Pack machine learning Beta cannot be upgraded. After the X-Pack machine learning features become generally available, you must re-create your jobs. If you have data sets and job configurations that you work with extensively in the beta, make note of all the details so that you can re-create them successfully.

X-Pack machine learning features do not support cross cluster search

edit

You cannot use cross cluster search in either the machine learning APIs or the machine learning features in Kibana.

For more information about cross cluster search, see Cross Cluster Search.

Anomaly Explorer omissions and limitations

edit

In Kibana, Anomaly Explorer charts are not displayed for anomalies that were due to categorization, time_of_day functions, or time_of_week functions. Those particular results do not display well as time series charts.

The Anomaly Explorer charts can also look odd in circumstances where there is very little data to plot. For example, if there is only one data point, it is represented as a single dot. If there are only two data points, they are joined by a line.

Jobs close on the datafeed end date

edit

If you start a datafeed and specify an end date, it will close the job when the datafeed stops. This behavior avoids having numerous open one-time jobs.

If you do not specify an end date when you start a datafeed, the job remains open when you stop the datafeed. This behavior avoids the overhead of closing and re-opening large jobs when there are pauses in the datafeed.

Jobs created in Kibana must use datafeeds

edit

If you create jobs in Kibana, you must use datafeeds. If the data that you want to analyze is not stored in Elasticsearch, you cannot use datafeeds and therefore you cannot create your jobs in Kibana. You can, however, use the machine learning APIs to create jobs and to send batches of data directly to the jobs. For more information, see Datafeeds and API Quick Reference.

Post data API requires JSON format

edit

The post data API enables you to send data to a job for analysis. The data that you send to the job must use the JSON format.

For more information about this API, see Post Data to Jobs.

Misleading high missing field counts

edit

One of the counts associated with a machine learning job is missing_field_count, which indicates the number of records that are missing a configured field.

Since jobs analyze JSON data, the missing_field_count might be misleading. Missing fields might be expected due to the structure of the data and therefore do not generate poor results.

For more information about missing_field_count, see Data Counts Objects.

Terms aggregation size affects data analysis

edit

By default, the terms aggregation returns the buckets for the top ten terms. You can change this default behavior by setting the size parameter.

If you are send pre-aggregated data to a job for analysis, you must ensure that the size is configured correctly. Otherwise, some data might not be analyzed.

Jobs created in Kibana use model plot config and pre-aggregated data

edit

If you create single or multi-metric jobs in Kibana, it might enable some options under the covers that you’d want to reconsider for large or long-running jobs.

For example, when you create a single metric job in Kibana, it generally enables the model_plot_config advanced configuration option. That configuration option causes model information to be stored along with the results and provides a more detailed view into anomaly detection. It is specifically used by the Single Metric Viewer in Kibana. When this option is enabled, however, it can add considerable overhead to the performance of the system. If you have jobs with many entities, for example data from tens of thousands of servers, storing this additional model information for every bucket might be problematic. If you are not certain that you need this option or if you experience performance issues, edit your job configuration to disable this option.

For more information, see Model Plot Config.

Likewise, when you create a single or multi-metric job in Kibana, in some cases it uses aggregations on the data that it retrieves from Elasticsearch. One of the benefits of summarizing data this way is that Elasticsearch automatically distributes these calculations across your cluster. This summarized data is then fed into X-Pack machine learning instead of raw results, which reduces the volume of data that must be considered while detecting anomalies. However, if you have two jobs, one of which uses pre-aggregated data and another that does not, their results might differ. This difference is due to the difference in precision of the input data. The machine learning analytics are designed to be aggregation-aware and the likely increase in performance that is gained by pre-aggregating the data makes the potentially poorer precision worthwhile. If you want to view or change the aggregations that are used in your job, refer to the aggregations property in your datafeed.

For more information, see Datafeed Resources.