- Elasticsearch Guide: other versions:
- Elasticsearch introduction
- Getting started with Elasticsearch
- Set up Elasticsearch
- Installing Elasticsearch
- Configuring Elasticsearch
- Important Elasticsearch configuration
- Important System Configuration
- Bootstrap Checks
- Heap size check
- File descriptor check
- Memory lock check
- Maximum number of threads check
- Max file size check
- Maximum size virtual memory check
- Maximum map count check
- Client JVM check
- Use serial collector check
- System call filter check
- OnError and OnOutOfMemoryError checks
- Early-access check
- G1GC check
- All permission check
- Discovery configuration check
- Starting Elasticsearch
- Stopping Elasticsearch
- Adding nodes to your cluster
- Set up X-Pack
- Configuring X-Pack Java Clients
- Bootstrap Checks for X-Pack
- Upgrade Elasticsearch
- API conventions
- Document APIs
- Search APIs
- Aggregations
- Metrics Aggregations
- Avg Aggregation
- Weighted Avg Aggregation
- Cardinality Aggregation
- Extended Stats Aggregation
- Geo Bounds Aggregation
- Geo Centroid Aggregation
- Max Aggregation
- Min Aggregation
- Percentiles Aggregation
- Percentile Ranks Aggregation
- Scripted Metric Aggregation
- Stats Aggregation
- Sum Aggregation
- Top Hits Aggregation
- Value Count Aggregation
- Median Absolute Deviation Aggregation
- Bucket Aggregations
- Adjacency Matrix Aggregation
- Auto-interval Date Histogram Aggregation
- Children Aggregation
- Composite Aggregation
- Date Histogram Aggregation
- Date Range Aggregation
- Diversified Sampler Aggregation
- Filter Aggregation
- Filters Aggregation
- Geo Distance Aggregation
- GeoHash grid Aggregation
- GeoTile Grid Aggregation
- Global Aggregation
- Histogram Aggregation
- IP Range Aggregation
- Missing Aggregation
- Nested Aggregation
- Parent Aggregation
- Range Aggregation
- Reverse nested Aggregation
- Sampler Aggregation
- Significant Terms Aggregation
- Significant Text Aggregation
- Terms Aggregation
- Pipeline Aggregations
- Avg Bucket Aggregation
- Derivative Aggregation
- Max Bucket Aggregation
- Min Bucket Aggregation
- Sum Bucket Aggregation
- Stats Bucket Aggregation
- Extended Stats Bucket Aggregation
- Percentiles Bucket Aggregation
- Moving Average Aggregation
- Moving Function Aggregation
- Cumulative Sum Aggregation
- Bucket Script Aggregation
- Bucket Selector Aggregation
- Bucket Sort Aggregation
- Serial Differencing Aggregation
- Matrix Aggregations
- Caching heavy aggregations
- Returning only aggregation results
- Aggregation Metadata
- Returning the type of the aggregation
- Metrics Aggregations
- Indices APIs
- Create Index
- Delete Index
- Get Index
- Indices Exists
- Open / Close Index API
- Shrink Index
- Split Index
- Rollover Index
- Put Mapping
- Get Mapping
- Get Field Mapping
- Types Exists
- Index Aliases
- Update Indices Settings
- Get Settings
- Analyze
- Index Templates
- Indices Stats
- Indices Segments
- Indices Recovery
- Indices Shard Stores
- Clear Cache
- Flush
- Refresh
- Force Merge
- cat APIs
- Cluster APIs
- Query DSL
- Scripting
- Mapping
- Analysis
- Anatomy of an analyzer
- Testing analyzers
- Analyzers
- Normalizers
- Tokenizers
- Standard Tokenizer
- Letter Tokenizer
- Lowercase Tokenizer
- Whitespace Tokenizer
- UAX URL Email Tokenizer
- Classic Tokenizer
- Thai Tokenizer
- NGram Tokenizer
- Edge NGram Tokenizer
- Keyword Tokenizer
- Pattern Tokenizer
- Char Group Tokenizer
- Simple Pattern Tokenizer
- Simple Pattern Split Tokenizer
- Path Hierarchy Tokenizer
- Path Hierarchy Tokenizer Examples
- Token Filters
- ASCII Folding Token Filter
- Flatten Graph Token Filter
- Length Token Filter
- Lowercase Token Filter
- Uppercase Token Filter
- NGram Token Filter
- Edge NGram Token Filter
- Porter Stem Token Filter
- Shingle Token Filter
- Stop Token Filter
- Word Delimiter Token Filter
- Word Delimiter Graph Token Filter
- Multiplexer Token Filter
- Conditional Token Filter
- Predicate Token Filter Script
- Stemmer Token Filter
- Stemmer Override Token Filter
- Keyword Marker Token Filter
- Keyword Repeat Token Filter
- KStem Token Filter
- Snowball Token Filter
- Phonetic Token Filter
- Synonym Token Filter
- Parsing synonym files
- Synonym Graph Token Filter
- Compound Word Token Filters
- Reverse Token Filter
- Elision Token Filter
- Truncate Token Filter
- Unique Token Filter
- Pattern Capture Token Filter
- Pattern Replace Token Filter
- Trim Token Filter
- Limit Token Count Token Filter
- Hunspell Token Filter
- Common Grams Token Filter
- Normalization Token Filter
- CJK Width Token Filter
- CJK Bigram Token Filter
- Delimited Payload Token Filter
- Keep Words Token Filter
- Keep Types Token Filter
- Exclude mode settings example
- Classic Token Filter
- Apostrophe Token Filter
- Decimal Digit Token Filter
- Fingerprint Token Filter
- MinHash Token Filter
- Remove Duplicates Token Filter
- Character Filters
- Modules
- Index modules
- Ingest node
- Pipeline Definition
- Ingest APIs
- Accessing Data in Pipelines
- Conditional Execution in Pipelines
- Handling Failures in Pipelines
- Processors
- Append Processor
- Bytes Processor
- Convert Processor
- Date Processor
- Date Index Name Processor
- Dissect Processor
- Dot Expander Processor
- Drop Processor
- Fail Processor
- Foreach Processor
- GeoIP Processor
- Grok Processor
- Gsub Processor
- HTML Strip Processor
- Join Processor
- JSON Processor
- KV Processor
- Lowercase Processor
- Pipeline Processor
- Remove Processor
- Rename Processor
- Script Processor
- Set Processor
- Set Security User Processor
- Split Processor
- Sort Processor
- Trim Processor
- Uppercase Processor
- URL Decode Processor
- User Agent processor
- Managing the index lifecycle
- Getting started with index lifecycle management
- Policy phases and actions
- Set up index lifecycle management policy
- Using policies to manage index rollover
- Update policy
- Index lifecycle error handling
- Restoring snapshots of managed indices
- Start and stop index lifecycle management
- Using ILM with existing indices
- SQL access
- Overview
- Getting Started with SQL
- Conventions and Terminology
- Security
- SQL REST API
- SQL Translate API
- SQL CLI
- SQL JDBC
- SQL ODBC
- SQL Client Applications
- SQL Language
- Functions and Operators
- Comparison Operators
- Logical Operators
- Math Operators
- Cast Operators
- LIKE and RLIKE Operators
- Aggregate Functions
- Grouping Functions
- Date/Time and Interval Functions and Operators
- Full-Text Search Functions
- Mathematical Functions
- String Functions
- Type Conversion Functions
- Geo Functions
- Conditional Functions And Expressions
- System Functions
- Reserved keywords
- SQL Limitations
- Monitor a cluster
- Frozen indices
- Set up a cluster for high availability
- Roll up or transform your data
- X-Pack APIs
- Info API
- Cross-cluster replication APIs
- Explore API
- Freeze index
- Index lifecycle management API
- Licensing APIs
- Machine learning APIs
- Add events to calendar
- Add jobs to calendar
- Close jobs
- Create jobs
- Create calendar
- Create datafeeds
- Create filter
- Delete calendar
- Delete datafeeds
- Delete events from calendar
- Delete filter
- Delete forecast
- Delete jobs
- Delete jobs from calendar
- Delete model snapshots
- Delete expired data
- Find file structure
- Flush jobs
- Forecast jobs
- Get calendars
- Get buckets
- Get overall buckets
- Get categories
- Get datafeeds
- Get datafeed statistics
- Get influencers
- Get jobs
- Get job statistics
- Get machine learning info
- Get model snapshots
- Get scheduled events
- Get filters
- Get records
- Open jobs
- Post data to jobs
- Preview datafeeds
- Revert model snapshots
- Set upgrade mode
- Start datafeeds
- Stop datafeeds
- Update datafeeds
- Update filter
- Update jobs
- Update model snapshots
- Migration APIs
- Rollup APIs
- Security APIs
- Authenticate
- Change passwords
- Clear cache
- Clear roles cache
- Create API keys
- Create or update application privileges
- Create or update role mappings
- Create or update roles
- Create or update users
- Delete application privileges
- Delete role mappings
- Delete roles
- Delete users
- Disable users
- Enable users
- Get API key information
- Get application privileges
- Get role mappings
- Get roles
- Get token
- Get users
- Has privileges
- Invalidate API key
- Invalidate token
- OpenID Connect Prepare Authentication API
- OpenID Connect Authenticate API
- OpenID Connect Logout API
- SSL certificate
- Transform APIs
- Unfreeze index
- Watcher APIs
- Definitions
- Secure a cluster
- Overview
- Configuring security
- Encrypting communications in Elasticsearch
- Encrypting communications in an Elasticsearch Docker Container
- Enabling cipher suites for stronger encryption
- Separating node-to-node and client traffic
- Configuring an Active Directory realm
- Configuring a file realm
- Configuring an LDAP realm
- Configuring a native realm
- Configuring a PKI realm
- Configuring a SAML realm
- Configuring a Kerberos realm
- Security files
- FIPS 140-2
- How security works
- User authentication
- Built-in users
- Internal users
- Token-based authentication services
- Realms
- Realm chains
- Active Directory user authentication
- File-based user authentication
- LDAP user authentication
- Native user authentication
- OpenID Connect authentication
- PKI user authentication
- SAML authentication
- Kerberos authentication
- Integrating with other authentication systems
- Enabling anonymous access
- Controlling the user cache
- Configuring SAML single-sign-on on the Elastic Stack
- Configuring single sign-on to the Elastic Stack using OpenID Connect
- User authorization
- Auditing security events
- Encrypting communications
- Restricting connections with IP filtering
- Cross cluster search, clients, and integrations
- Tutorial: Getting started with security
- Tutorial: Encrypting communications
- Troubleshooting
- Some settings are not returned via the nodes settings API
- Authorization exceptions
- Users command fails due to extra arguments
- Users are frequently locked out of Active Directory
- Certificate verification fails for curl on Mac
- SSLHandshakeException causes connections to fail
- Common SSL/TLS exceptions
- Common Kerberos exceptions
- Common SAML issues
- Internal Server Error in Kibana
- Setup-passwords command fails due to connection failure
- Failures due to relocation of the configuration files
- Limitations
- Alerting on cluster and index events
- Command line tools
- How To
- Testing
- Glossary of terms
- Release highlights
- Breaking changes
- Release notes
- Elasticsearch version 7.2.1
- Elasticsearch version 7.2.0
- Elasticsearch version 7.1.1
- Elasticsearch version 7.1.0
- Elasticsearch version 7.0.0
- Elasticsearch version 7.0.0-rc2
- Elasticsearch version 7.0.0-rc1
- Elasticsearch version 7.0.0-beta1
- Elasticsearch version 7.0.0-alpha2
- Elasticsearch version 7.0.0-alpha1
Results resources
editResults resources
editSeveral different result types are created for each job. You can query anomaly results for buckets, influencers, and records by using the results API. Summarized bucket results over multiple jobs can be queried as well; those results are called overall buckets.
Results are written for each bucket_span
. The timestamp for the results is the
start of the bucket time interval.
The results include scores, which are calculated for each anomaly result type and each bucket interval. These scores are aggregated in order to reduce noise, and normalized in order to identify and rank the most mathematically significant anomalies.
Bucket results provide the top level, overall view of the job and are ideal for alerts. For example, the bucket results might indicate that at 16:05 the system was unusual. This information is a summary of all the anomalies, pinpointing when they occurred.
Influencer results show which entities were anomalous and when. For example,
the influencer results might indicate that at 16:05 user_name: Bob
was unusual.
This information is a summary of all the anomalies for each entity, so there
can be a lot of these results. Once you have identified a notable bucket time,
you can look to see which entities were significant.
Record results provide details about what the individual anomaly was, when it occurred and which entity was involved. For example, the record results might indicate that at 16:05 Bob sent 837262434 bytes, when the typical value was 1067 bytes. Once you have identified a bucket time and perhaps a significant entity too, you can drill through to the record results in order to investigate the anomalous behavior.
Categorization results contain the definitions of categories that have been identified. These are only applicable for jobs that are configured to analyze unstructured log data using categorization. These results do not contain a timestamp or any calculated scores. For more information, see Categorizing log messages.
All of these resources and properties are informational; you cannot change their values.
Buckets
editBucket results provide the top level, overall view of the job and are best for alerting.
Each bucket has an anomaly_score
, which is a statistically aggregated and
normalized view of the combined anomalousness of all the record results within
each bucket.
One bucket result is written for each bucket_span
for each job, even if it is
not considered to be anomalous. If the bucket is not anomalous, it has an
anomaly_score
of zero.
When you identify an anomalous bucket, you can investigate further by expanding the bucket resource to show the records as nested objects. Alternatively, you can access the records resource directly and filter by the date range.
A bucket resource has the following properties:
-
anomaly_score
- (number) The maximum anomaly score, between 0-100, for any of the bucket influencers. This is an overall, rate-limited score for the job. All the anomaly records in the bucket contribute to this score. This value might be updated as new data is analyzed.
-
bucket_influencers
- (array) An array of bucket influencer objects. For more information, see Bucket Influencers.
-
bucket_span
-
(number) The length of the bucket in seconds.
This value matches the
bucket_span
that is specified in the job. -
event_count
- (number) The number of input data records processed in this bucket.
-
initial_anomaly_score
-
(number) The maximum
anomaly_score
for any of the bucket influencers. This is the initial value that was calculated at the time the bucket was processed. -
is_interim
- (boolean) If true, this is an interim result. In other words, the bucket results are calculated based on partial input data.
-
job_id
- (string) The unique identifier for the job that these results belong to.
-
processing_time_ms
- (number) The amount of time, in milliseconds, that it took to analyze the bucket contents and calculate results.
-
result_type
-
(string) Internal. This value is always set to
bucket
. -
timestamp
-
(date) The start time of the bucket. This timestamp uniquely identifies the
bucket.
Events that occur exactly at the timestamp of the bucket are included in the results for the bucket.
Bucket Influencers
editBucket influencer results are available as nested objects contained within
bucket results. These results are an aggregation for each type of influencer.
For example, if both client_ip
and user_name
were specified as influencers,
then you would be able to determine when the client_ip
or user_name
values
were collectively anomalous.
There is a built-in bucket influencer called bucket_time
which is always
available. This bucket influencer is the aggregation of all records in the
bucket; it is not just limited to a type of influencer.
A bucket influencer is a type of influencer. For example, client_ip
or
user_name
can be bucket influencers, whereas 192.168.88.2
and Bob
are
influencers.
An bucket influencer object has the following properties:
-
anomaly_score
- (number) A normalized score between 0-100, which is calculated for each bucket influencer. This score might be updated as newer data is analyzed.
-
bucket_span
-
(number) The length of the bucket in seconds. This value matches the
bucket_span
that is specified in the job. -
initial_anomaly_score
- (number) The score between 0-100 for each bucket influencer. This score is the initial value that was calculated at the time the bucket was processed.
-
influencer_field_name
-
(string) The field name of the influencer. For example
client_ip
oruser_name
. -
influencer_field_value
-
(string) The field value of the influencer. For example
192.168.88.2
orBob
. -
is_interim
- (boolean) If true, this is an interim result. In other words, the bucket influencer results are calculated based on partial input data.
-
job_id
- (string) The unique identifier for the job that these results belong to.
-
probability
-
(number) The probability that the bucket has this behavior, in the range 0
to 1. For example, 0.0000109783. This value can be held to a high precision
of over 300 decimal places, so the
anomaly_score
is provided as a human-readable and friendly interpretation of this. -
raw_anomaly_score
- (number) Internal.
-
result_type
-
(string) Internal. This value is always set to
bucket_influencer
. -
timestamp
- (date) The start time of the bucket for which these results were calculated.
Influencers
editInfluencers are the entities that have contributed to, or are to blame for,
the anomalies. Influencer results are available only if an
influencer_field_name
is specified in the job configuration.
Influencers are given an influencer_score
, which is calculated based on the
anomalies that have occurred in each bucket interval. For jobs with more than
one detector, this gives a powerful view of the most anomalous entities.
For example, if you are analyzing unusual bytes sent and unusual domains
visited and you specified user_name
as the influencer, then an
influencer_score
for each anomalous user name is written per bucket. For
example, if user_name: Bob
had an influencer_score
greater than 75, then
Bob
would be considered very anomalous during this time interval in one or
both of those areas (unusual bytes sent or unusual domains visited).
One influencer result is written per bucket for each influencer that is considered anomalous.
When you identify an influencer with a high score, you can investigate further by accessing the records resource for that bucket and enumerating the anomaly records that contain the influencer.
An influencer object has the following properties:
-
bucket_span
-
(number) The length of the bucket in seconds. This value matches the
bucket_span
that is specified in the job. -
influencer_score
-
(number) A normalized score between 0-100, which is based on the probability
of the influencer in this bucket aggregated across detectors. Unlike
initial_influencer_score
, this value will be updated by a re-normalization process as new data is analyzed. -
initial_influencer_score
- (number) A normalized score between 0-100, which is based on the probability of the influencer aggregated across detectors. This is the initial value that was calculated at the time the bucket was processed.
-
influencer_field_name
- (string) The field name of the influencer.
-
influencer_field_value
- (string) The entity that influenced, contributed to, or was to blame for the anomaly.
-
is_interim
- (boolean) If true, this is an interim result. In other words, the influencer results are calculated based on partial input data.
-
job_id
- (string) The unique identifier for the job that these results belong to.
-
probability
-
(number) The probability that the influencer has this behavior, in the range
0 to 1. For example, 0.0000109783. This value can be held to a high precision
of over 300 decimal places, so the
influencer_score
is provided as a human-readable and friendly interpretation of this. -
result_type
-
(string) Internal. This value is always set to
influencer
. -
timestamp
- (date) The start time of the bucket for which these results were calculated.
Additional influencer properties are added, depending on the fields being
analyzed. For example, if it’s analyzing user_name
as an influencer, then a
field user_name
is added to the result document. This information enables you to
filter the anomaly results more easily.
Records
editRecords contain the detailed analytical results. They describe the anomalous activity that has been identified in the input data based on the detector configuration.
For example, if you are looking for unusually large data transfers, an anomaly record can identify the source IP address, the destination, the time window during which it occurred, the expected and actual size of the transfer, and the probability of this occurrence.
There can be many anomaly records depending on the characteristics and size of the input data. In practice, there are often too many to be able to manually process them. The machine learning features therefore perform a sophisticated aggregation of the anomaly records into buckets.
The number of record results depends on the number of anomalies found in each bucket, which relates to the number of time series being modeled and the number of detectors.
A record object has the following properties:
-
actual
- (array) The actual value for the bucket.
-
bucket_span
-
(number) The length of the bucket in seconds.
This value matches the
bucket_span
that is specified in the job. -
by_field_name
-
(string) The name of the analyzed field. This value is present only if
it is specified in the detector. For example,
client_ip
. -
by_field_value
-
(string) The value of
by_field_name
. This value is present only if it is specified in the detector. For example,192.168.66.2
. -
causes
-
(array) For population analysis, an over field must be specified in the
detector. This property contains an array of anomaly records that are the
causes for the anomaly that has been identified for the over field. If no
over fields exist, this field is not present. This sub-resource contains
the most anomalous records for the
over_field_name
. For scalability reasons, a maximum of the 10 most significant causes of the anomaly are returned. As part of the core analytical modeling, these low-level anomaly records are aggregated for their parent over field record. The causes resource contains similar elements to the record resource, namelyactual
,typical
,*_field_name
and*_field_value
. Probability and scores are not applicable to causes. -
detector_index
- (number) A unique identifier for the detector.
-
field_name
-
(string) Certain functions require a field to operate on, for example,
sum()
. For those functions, this value is the name of the field to be analyzed. -
function
-
(string) The function in which the anomaly occurs, as specified in the
detector configuration. For example,
max
. -
function_description
- (string) The description of the function in which the anomaly occurs, as specified in the detector configuration.
-
influencers
-
(array) If
influencers
was specified in the detector configuration, then this array contains influencers that contributed to or were to blame for an anomaly. -
initial_record_score
- (number) A normalized score between 0-100, which is based on the probability of the anomalousness of this record. This is the initial value that was calculated at the time the bucket was processed.
-
is_interim
- (boolean) If true, this is an interim result. In other words, the anomaly record is calculated based on partial input data.
-
job_id
- (string) The unique identifier for the job that these results belong to.
-
over_field_name
-
(string) The name of the over field that was used in the analysis. This value
is present only if it was specified in the detector. Over fields are used
in population analysis. For example,
user
. -
over_field_value
-
(string) The value of
over_field_name
. This value is present only if it was specified in the detector. For example,Bob
. -
partition_field_name
-
(string) The name of the partition field that was used in the analysis. This
value is present only if it was specified in the detector. For example,
region
. -
partition_field_value
-
(string) The value of
partition_field_name
. This value is present only if it was specified in the detector. For example,us-east-1
. -
probability
-
(number) The probability of the individual anomaly occurring, in the range
0 to 1. For example, 0.0000772031. This value can be held to a high precision
of over 300 decimal places, so the
record_score
is provided as a human-readable and friendly interpretation of this. -
multi_bucket_impact
- (number) an indication of how strongly an anomaly is multi bucket or single bucket. The value is on a scale of -5 to +5 where -5 means the anomaly is purely single bucket and +5 means the anomaly is purely multi bucket.
-
record_score
-
(number) A normalized score between 0-100, which is based on the probability
of the anomalousness of this record. Unlike
initial_record_score
, this value will be updated by a re-normalization process as new data is analyzed. -
result_type
-
(string) Internal. This is always set to
record
. -
timestamp
- (date) The start time of the bucket for which these results were calculated.
-
typical
- (array) The typical value for the bucket, according to analytical modeling.
Additional record properties are added, depending on the fields being
analyzed. For example, if it’s analyzing hostname
as a by field, then a field
hostname
is added to the result document. This information enables you to
filter the anomaly results more easily.
Categories
editWhen categorization_field_name
is specified in the job configuration, it is
possible to view the definitions of the resulting categories. A category
definition describes the common terms matched and contains examples of matched
values.
The anomaly results from a categorization analysis are available as bucket, influencer, and record results. For example, the results might indicate that at 16:45 there was an unusual count of log message category 11. You can then examine the description and examples of that category.
A category resource has the following properties:
-
category_id
- (unsigned integer) A unique identifier for the category.
-
examples
- (array) A list of examples of actual values that matched the category.
-
grok_pattern
- [preview] This functionality is in technical preview and may be changed or removed in a future release. Elastic will work to fix any issues, but features in technical preview are not subject to the support SLA of official GA features. (string) A Grok pattern that could be used in Logstash or an Ingest Pipeline to extract fields from messages that match the category. This field is experimental and may be changed or removed in a future release. The Grok patterns that are found are not optimal, but are often a good starting point for manual tweaking.
-
job_id
- (string) The unique identifier for the job that these results belong to.
-
max_matching_length
- (unsigned integer) The maximum length of the fields that matched the category. The value is increased by 10% to enable matching for similar fields that have not been analyzed.
-
regex
- (string) A regular expression that is used to search for values that match the category.
-
terms
- (string) A space separated list of the common tokens that are matched in values of the category.
Overall Buckets
editOverall buckets provide a summary of bucket results over multiple jobs.
Their bucket_span
equals the longest bucket_span
of the jobs in question.
The overall_score
is the top_n
average of the max anomaly_score
per job
within the overall bucket time interval.
This means that you can fine-tune the overall_score
so that it is more
or less sensitive to the number of jobs that detect an anomaly at the same time.
An overall bucket resource has the following properties:
-
timestamp
- (date) The start time of the overall bucket.
-
bucket_span
-
(number) The length of the bucket in seconds. Matches the
bucket_span
of the job with the longest one. -
overall_score
-
(number) The
top_n
average of the max bucketanomaly_score
per job. -
jobs
-
(array) An array of objects that contain the
max_anomaly_score
perjob_id
. -
is_interim
- (boolean) If true, this is an interim result. In other words, the anomaly record is calculated based on partial input data.
-
result_type
-
(string) Internal. This is always set to
overall_bucket
.