Machine learning anomaly detection limitations
editMachine learning anomaly detection limitations
editThe following limitations and known problems apply to the 7.8.1 release of the Elastic machine learning features:
CPUs must support SSE4.2
editMachine learning uses Streaming SIMD Extensions (SSE) 4.2 instructions, so it works only
on machines whose CPUs support
SSE4.2. If you run Elasticsearch on older hardware you must disable machine learning by setting
xpack.ml.enabled
to false
. See
Machine learning settings in Elasticsearch.
Categorization uses English dictionary words
editCategorization identifies static parts of unstructured logs and groups similar
messages together. The default categorization tokenizer assumes English language
log messages. For other languages you must define a different
categorization_analyzer
for your job. For more information, see
Detecting anomalous categories of data.
Additionally, a dictionary used to influence the categorization process contains only English words. This means categorization might work better in English than in other languages. The ability to customize the dictionary will be added in a future release.
Pop-ups must be enabled in browsers
editThe machine learning features in Kibana use pop-ups. You must configure your web browser so that it does not block pop-up windows or create an exception for your Kibana URL.
Anomaly Explorer omissions and limitations
editIn Kibana, Anomaly Explorer charts are not displayed for anomalies
that were due to categorization, time_of_day
functions, or time_of_week
functions. Those particular results do not display well as time series
charts.
The charts are also not displayed for detectors that use script fields. In that case, the original source data cannot be easily searched because it has been somewhat transformed by the script.
The Anomaly Explorer charts can also look odd in circumstances where there is very little data to plot. For example, if there is only one data point, it is represented as a single dot. If there are only two data points, they are joined by a line.
Jobs close on the datafeed end date
editIf you start a datafeed and specify an end date, it will close the job when the datafeed stops. This behavior avoids having numerous open one-time jobs.
If you do not specify an end date when you start a datafeed, the job remains open when you stop the datafeed. This behavior avoids the overhead of closing and re-opening large jobs when there are pauses in the datafeed.
Jobs created in Kibana must use datafeeds
editIf you create jobs in Kibana, you must use datafeeds. If the data that you want to analyze is not stored in Elasticsearch, you cannot use datafeeds and therefore you cannot create your jobs in Kibana. You can, however, use the machine learning APIs to create jobs and to send batches of data directly to the jobs. For more information, see Datafeeds and API quick reference.
Post data API requires JSON format
editThe post data API enables you to send data to a job for analysis. The data that you send to the job must use the JSON format.
For more information about this API, see Post Data to Jobs.
Misleading high missing field counts
editOne of the counts associated with a machine learning job is missing_field_count
,
which indicates the number of records that are missing a configured field.
Since jobs analyze JSON data, the missing_field_count
might be misleading.
Missing fields might be expected due to the structure of the data and therefore
do not generate poor results.
For more information about missing_field_count
,
see the get anomaly detection job statistics API.
Terms aggregation size affects data analysis
editBy default, the terms
aggregation returns the buckets for the top ten terms.
You can change this default behavior by setting the size
parameter.
If you are send pre-aggregated data to a job for analysis, you must ensure
that the size
is configured correctly. Otherwise, some data might not be
analyzed.
Fields named "by", "count", or "over" cannot be used to split data
editYou cannot use the following field names in the by_field_name
or
over_field_name
properties in a job: by
; count
; over
. This limitation
also applies to those properties when you create advanced jobs in Kibana.
Jobs created in Kibana use model plot config and pre-aggregated data
editIf you create single or multi-metric jobs in Kibana, it might enable some options under the covers that you’d want to reconsider for large or long-running jobs.
For example, when you create a single metric job in Kibana, it generally
enables the model_plot_config
advanced configuration option. That configuration
option causes model information to be stored along with the results and provides
a more detailed view into anomaly detection. It is specifically used by the
Single Metric Viewer in Kibana. When this option is enabled, however, it can
add considerable overhead to the performance of the system. If you have jobs
with many entities, for example data from tens of thousands of servers, storing
this additional model information for every bucket might be problematic. If you
are not certain that you need this option or if you experience performance
issues, edit your job configuration to disable this option.
Likewise, when you create a single or multi-metric job in Kibana, in some cases
it uses aggregations on the data that it retrieves from Elasticsearch. One of the
benefits of summarizing data this way is that Elasticsearch automatically distributes
these calculations across your cluster. This summarized data is then fed into
machine learning instead of raw results, which reduces the volume of data that must
be considered while detecting anomalies. However, if you have two jobs, one of
which uses pre-aggregated data and another that does not, their results might
differ. This difference is due to the difference in precision of the input data.
The machine learning analytics are designed to be aggregation-aware and the likely increase
in performance that is gained by pre-aggregating the data makes the potentially
poorer precision worthwhile. If you want to view or change the aggregations
that are used in your job, refer to the aggregations
property in your datafeed.
Security integration
editWhen the Elasticsearch security features are enabled, a datafeed stores the roles of the user who created or updated the datafeed at that time. This means that if those roles are updated then the datafeed subsequently runs with the new permissions that are associated with the roles. However, if the user’s roles are adjusted after creating or updating the datafeed, the datafeed continues to run with the permissions that were associated with the original roles. For more information, see Datafeeds.
Jobs must be stopped before upgrades
editYou must stop any machine learning jobs that are running before you start the upgrade process. For more information, see Stop machine learning anomaly detection and Upgrading the Elastic Stack.
Rollup indices and index patterns are not supported
editRollup indices and index patterns cannot be used in machine learning jobs or datafeeds. This limitation applies irrespective of whether you create the jobs in Kibana or by using APIs. In Kibana, if you select an index, saved search, or index pattern that uses the Rollup feature, the machine learning job creation wizards fail.
See Rolling up historical data.
Machine learning objects do not belong to Kibana spaces
editIf you create spaces in Kibana, you see only the saved objects that belong to that space. This limited scope does not apply to machine learning objects; they are visible in all of your spaces.
However, the Kibana machine learning features interact with some saved objects (such as index patterns, dashboards, and visualizations) that might not be available in all spaces. For example:
- [preview] This functionality is in technical preview and may be changed or removed in a future release. Elastic will work to fix any issues, but features in technical preview are not subject to the support SLA of official GA features. If you upload a file on the Machine Learning page in Kibana, the machine learning features identify the file format and field mappings. You can then optionally import that data into an Elasticsearch index and create an index pattern. This index pattern belongs to the space that is active when you import the file. If you want to use the index pattern in other spaces, you must create it as needed.
- Likewise, if you use the single-metric, multi-metric, or population job wizards to create a job in Kibana, you can encounter problems if you subsequently try to clone the job in a different space. In particular, problems occur when you try to clone the job in a space that does not contain appropriate index patterns.
- If you used a supplied configuration to create jobs (for example, for Apache or NGINX web access logs), visualizations and dashboards are automatically generated. These objects belong to the space that was active when you created the job. If you change your active space, custom URLs from the machine learning results to the dashboards or visualizations might fail.
Job and datafeed APIs have a maximum search size
editIn 6.6 and later releases, the get jobs API and the get job statistics API return a maximum of 10,000 jobs. Likewise, the get datafeeds API and the get datafeed statistics API return a maximum of 10,000 datafeeds.
Date nanoseconds data types are not supported
editWhen you create an anomaly detection job, you cannot use a field with the
date_nanos
data type as the time_field
in the
data_description
object. This limitation applies irrespective of whether you
create jobs in Kibana or by using APIs.
Forecast limitations
editThere are some limitations that affect your ability to create a forecast:
- You can generate only three forecasts per anomaly detection job concurrently. There is no limit to the number of forecasts that you retain. Existing forecasts are not overwritten when you create new forecasts. Rather, they are automatically deleted when they expire.
-
If you use an
over_field_name
property in your anomaly detection job (that is to say, it’s a population job), you cannot create a forecast. -
If you use any of the following analytical functions in your anomaly detection job, you cannot create a forecast:
-
lat_long
-
rare
andfreq_rare
-
time_of_day
andtime_of_week
For more information about any of these functions, see Function reference.
-
- Forecasts run concurrently with real-time machine learning analysis. That is to say, machine learning analysis does not stop while forecasts are generated. Forecasts can have an impact on anomaly detection jobs, however, especially in terms of memory usage. For this reason, forecasts run only if the model memory status is acceptable.
- The anomaly detection job must be open when you create a forecast. Otherwise, an error occurs.
- If there is insufficient data to generate any meaningful predictions, an error occurs. In general, forecasts that are created early in the learning phase of the data analysis are less accurate.
Frozen indices are not supported
editFrozen indices cannot be used in anomaly detection jobs or
datafeeds. This limitation applies irrespective of whether you create the jobs in
Kibana or by using APIs. This limitation exists because it’s currently not
possible to specify the ignore_throttled
query parameter for search requests
in datafeeds or jobs. See
Searching a frozen index.
CPU scheduling improvements apply to Linux and MacOS only
editWhen there are many machine learning jobs running at the same time and there are insufficient CPU resources, the JVM performance must be prioritized so search and indexing latency remain acceptable. To that end, when CPU is constrained on Linux and MacOS environments, the CPU scheduling priority of native analysis processes is reduced to favor the Elasticsearch JVM. This improvement does not apply to Windows environments.