Elasticsearch Guide: other versions:
Elasticsearch introduction
- Data in: documents and indices
- Information out: search and analyze
- Scalability and resilience
Getting started with Elasticsearch
- Get Elasticsearch up and running
- Index some documents
- Start searching
- Analyze results with aggregations
- Where to go from here
Set up Elasticsearch
- Installing Elasticsearch
- Configuring Elasticsearch
- Important Elasticsearch configuration
- Important System Configuration
- Bootstrap Checks
- Starting Elasticsearch
- Stopping Elasticsearch
- Adding nodes to your cluster
- Set up X-Pack
- Configuring X-Pack Java Clients
- Bootstrap Checks for X-Pack
Upgrade Elasticsearch
- Rolling upgrades
- Full cluster restart upgrade
- Reindex before upgrading
  - Reindex in place
  - Reindex from a remote cluster
API conventions
- Multiple Indices
- Date math support in index names
- Common options
- URL-based access control
Document APIs
- Reading and Writing documents
- Index API
- Get API
- Delete API
- Delete By Query API
- Update API
- Update By Query API
- Multi Get API
- Bulk API
- Reindex API
- Term Vectors
- Multi termvectors API
- ?refresh
- Optimistic concurrency control
Search APIs
- Search
- URI Search
- Request Body Search
- Search Template
- Multi Search Template
- Search Shards API
- Suggesters
- Multi Search API
- Count API
- Validate API
- Explain API
- Profile API
- Field Capabilities API
- Ranking Evaluation API
Aggregations
- Metrics Aggregations
- Bucket Aggregations
- Pipeline Aggregations
- Matrix Aggregations
  - Matrix Stats
- Caching heavy aggregations
- Returning only aggregation results
- Aggregation Metadata
- Returning the type of the aggregation
Indices APIs
- Create Index
- Delete Index
- Get Index
- Indices Exists
- Open / Close Index API
- Shrink Index
- Split Index
- Rollover Index
- Put Mapping
- Get Mapping
- Get Field Mapping
- Types Exists
- Index Aliases
- Update Indices Settings
- Get Settings
- Analyze
  - Explain Analyze
- Index Templates
- Indices Stats
- Indices Segments
- Indices Recovery
- Indices Shard Stores
- Clear Cache
- Flush
  - Synced Flush
- Refresh
- Force Merge
cat APIs
- cat aliases
- cat allocation
- cat count
- cat fielddata
- cat health
- cat indices
- cat master
- cat nodeattrs
- cat nodes
- cat pending tasks
- cat plugins
- cat recovery
- cat repositories
- cat thread pool
- cat shards
- cat segments
- cat snapshots
- cat templates
Cluster APIs
- Cluster Health
- Cluster State
- Cluster Stats
- Pending cluster tasks
- Cluster Reroute
- Cluster Update Settings
- Cluster Get Settings
- Nodes Stats
- Nodes Info
- Nodes Feature Usage
- Remote Cluster Info
- Task Management API
- Nodes hot_threads
- Cluster Allocation Explain API
- Voting Configuration Exclusions
Query DSL
- Query and filter context
- Compound queries
- Full text queries
- Geo queries
- Joining queries
  - Nested
  - Has child
  - Has parent
  - Parent ID
- Match all
- Span queries
- Specialized queries
- Term-level queries
  - Exists
  - Fuzzy
  - IDs
  - Prefix
  - Range
  - Regexp
  - Term
  - Terms
  - Terms set
  - Type Query
  - Wildcard
- minimum_should_match parameter
- rewrite parameter
- Regular expression syntax
Scripting
- How to use scripts
- Accessing document fields and special variables
- Scripting and security
- Painless scripting language
- Lucene expressions language
- Advanced scripts using script engines
Mapping
- Removal of mapping types
- Field datatypes
  - Alias
  - Arrays
  - Binary
  - Boolean
  - Date
  - Date nanoseconds
  - Dense vector
  - Geo-point
  - Geo-shape
  - IP
  - Join
  - Keyword
  - Nested
  - Numeric
  - Object
  - Percolator
  - Range
  - Rank feature
  - Rank features
  - Search-as-you-type
  - Sparse vector
  - Text
  - Token count
- Meta-Fields
- Mapping parameters
- Dynamic Mapping
  - Dynamic field mapping
  - Dynamic templates
Analysis
- Anatomy of an analyzer
- Testing analyzers
- Analyzers
- Normalizers
- Tokenizers
- Token Filters
- Character Filters
Modules
- Discovery and cluster formation
- Shard allocation and cluster-level routing
- Local Gateway
  - Dangling indices
- HTTP
- Indices
- Network Settings
- Node
- Plugins
- Snapshot And Restore
- Thread Pool
- Transport
- Remote clusters
- Cross-cluster search
Index modules
- Analysis
- Index Shard Allocation
- Mapper
- Merge
- Similarity module
- Slow Log
- Store
  - Preloading data into the file system cache
- Translog
- Index Sorting
  - Use index sorting to speed up conjunctions
Ingest node
- Pipeline Definition
- Ingest APIs
- Accessing Data in Pipelines
- Conditional Execution in Pipelines
- Handling Failures in Pipelines
- Processors
Managing the index lifecycle
- Getting started with index lifecycle management
- Policy phases and actions
- Set up index lifecycle management policy
  - Applying a policy to an index template
  - Apply a policy to a create index request
- Using policies to manage index rollover
  - Skipping Rollover
- Update policy
- Index lifecycle error handling
- Restoring snapshots of managed indices
- Start and stop index lifecycle management
- Using ILM with existing indices
  - Managing existing periodic indices with ILM
  - Reindexing via ILM
SQL access
- Overview
- Getting Started with SQL
- Conventions and Terminology
  - Mapping concepts across SQL and Elasticsearch
- Security
- SQL REST API
- SQL Translate API
- SQL CLI
- SQL JDBC
  - API usage
- SQL ODBC
  - Driver installation
  - Configuration
- SQL Client Applications
- SQL Language
- Functions and Operators
- Reserved keywords
- SQL Limitations
Monitor a cluster
- Overview
- How it works
- Monitoring in a production environment
- Elastic Stack Monitoring Service
- Collecting monitoring data
  - Pausing data collection
- Collecting monitoring data with Metricbeat
- Collecting log data with Filebeat
- Configuring indices for monitoring
  - Collectors
- Exporters
  - Local exporters
  - HTTP exporters
- Troubleshooting
Frozen indices
- Best practices
- Searching a frozen index
- Monitoring frozen indices
Set up a cluster for high availability
- Back up a cluster
- Cross-cluster replication
Roll up or transform your data
- Rolling up historical data
- Transforming data
X-Pack APIs
- Info API
- Cross-cluster replication APIs
- Explore API
- Freeze index
- Index lifecycle management API
- Licensing APIs
- Machine learning APIs
- Migration APIs
  - Deprecation info
- Rollup APIs
- Security APIs
- Transform APIs
- Unfreeze index
- Watcher APIs
  - Put watch
  - Get watch
  - Delete watch
  - Execute watch
  - Ack watch
  - Activate watch
  - Deactivate watch
  - Stats
  - Stop
  - Start
- Definitions
Secure a cluster
- Overview
- Configuring security
- How security works
- User authentication
- Configuring SAML single-sign-on on the Elastic Stack
- Configuring single sign-on to the Elastic Stack using OpenID Connect
- User authorization
- Auditing security events
- Encrypting communications
  - Setting up TLS on a cluster
- Restricting connections with IP filtering
- Cross cluster search, clients, and integrations
- Tutorial: Getting started with security
- Tutorial: Encrypting communications
- Troubleshooting
- Limitations
Alerting on cluster and index events
- Getting started with Watcher
- How Watcher works
- Encrypting sensitive data in Watcher
- Inputs
- Triggers
  - Schedule trigger
- Conditions
- Actions
- Payload transforms
- Java API
- Managing watches
- Example watches
  - Watching the status of an Elasticsearch cluster
  - Watching event data
- Watcher
- Watcher limitations
Command line tools
- elasticsearch-certgen
- elasticsearch-certutil
- elasticsearch-croneval
- elasticsearch-migrate
- elasticsearch-node
- elasticsearch-saml-metadata
- elasticsearch-setup-passwords
- elasticsearch-shard
- elasticsearch-syskeygen
- elasticsearch-users
How To
- General recommendations
- Recipes
- Tune for indexing speed
- Tune for search speed
- Tune for disk usage
Testing
- Java Testing Framework
Glossary of terms
Release highlights
- 7.2.0
- 7.1.0
- 7.0.0
Breaking changes
- 7.2
- 7.1
- 7.0
Release notes
- Elasticsearch version 7.2.1
- Elasticsearch version 7.2.0
- Elasticsearch version 7.1.1
- Elasticsearch version 7.1.0
- Elasticsearch version 7.0.0
- Elasticsearch version 7.0.0-rc2
- Elasticsearch version 7.0.0-rc1
- Elasticsearch version 7.0.0-beta1
- Elasticsearch version 7.0.0-alpha2
- Elasticsearch version 7.0.0-alpha1

IMPORTANT: No additional bug fixes or documentation updates will be released for this version. For the latest information, see the current release documentation.

« Role mapping resources Scheduled event resources »

› › ›

Results resources

edit

Results resources

edit

Several different result types are created for each job. You can query anomaly results for buckets, influencers, and records by using the results API. Summarized bucket results over multiple jobs can be queried as well; those results are called overall buckets.

Results are written for each bucket_span. The timestamp for the results is the start of the bucket time interval.

The results include scores, which are calculated for each anomaly result type and each bucket interval. These scores are aggregated in order to reduce noise, and normalized in order to identify and rank the most mathematically significant anomalies.

Bucket results provide the top level, overall view of the job and are ideal for alerts. For example, the bucket results might indicate that at 16:05 the system was unusual. This information is a summary of all the anomalies, pinpointing when they occurred.

Influencer results show which entities were anomalous and when. For example, the influencer results might indicate that at 16:05 user_name: Bob was unusual. This information is a summary of all the anomalies for each entity, so there can be a lot of these results. Once you have identified a notable bucket time, you can look to see which entities were significant.

Record results provide details about what the individual anomaly was, when it occurred and which entity was involved. For example, the record results might indicate that at 16:05 Bob sent 837262434 bytes, when the typical value was 1067 bytes. Once you have identified a bucket time and perhaps a significant entity too, you can drill through to the record results in order to investigate the anomalous behavior.

Categorization results contain the definitions of categories that have been identified. These are only applicable for jobs that are configured to analyze unstructured log data using categorization. These results do not contain a timestamp or any calculated scores. For more information, see Categorizing log messages.

All of these resources and properties are informational; you cannot change their values.

Buckets

edit

Bucket results provide the top level, overall view of the job and are best for alerting.

Each bucket has an anomaly_score, which is a statistically aggregated and normalized view of the combined anomalousness of all the record results within each bucket.

One bucket result is written for each bucket_span for each job, even if it is not considered to be anomalous. If the bucket is not anomalous, it has an anomaly_score of zero.

When you identify an anomalous bucket, you can investigate further by expanding the bucket resource to show the records as nested objects. Alternatively, you can access the records resource directly and filter by the date range.

A bucket resource has the following properties:

anomaly_score: (number) The maximum anomaly score, between 0-100, for any of the bucket influencers. This is an overall, rate-limited score for the job. All the anomaly records in the bucket contribute to this score. This value might be updated as new data is analyzed.
bucket_influencers: (array) An array of bucket influencer objects. For more information, see Bucket Influencers.
bucket_span: (number) The length of the bucket in seconds. This value matches the bucket_span that is specified in the job.
event_count: (number) The number of input data records processed in this bucket.
initial_anomaly_score: (number) The maximum anomaly_score for any of the bucket influencers. This is the initial value that was calculated at the time the bucket was processed.
is_interim: (boolean) If true, this is an interim result. In other words, the bucket results are calculated based on partial input data.
job_id: (string) The unique identifier for the job that these results belong to.
processing_time_ms: (number) The amount of time, in milliseconds, that it took to analyze the bucket contents and calculate results.
result_type: (string) Internal. This value is always set to bucket.
timestamp: (date) The start time of the bucket. This timestamp uniquely identifies the bucket.

Events that occur exactly at the timestamp of the bucket are included in the results for the bucket.

Bucket Influencers

edit

Bucket influencer results are available as nested objects contained within bucket results. These results are an aggregation for each type of influencer. For example, if both client_ip and user_name were specified as influencers, then you would be able to determine when the client_ip or user_name values were collectively anomalous.

There is a built-in bucket influencer called bucket_time which is always available. This bucket influencer is the aggregation of all records in the bucket; it is not just limited to a type of influencer.

A bucket influencer is a type of influencer. For example, client_ip or user_name can be bucket influencers, whereas 192.168.88.2 and Bob are influencers.

An bucket influencer object has the following properties:

anomaly_score: (number) A normalized score between 0-100, which is calculated for each bucket influencer. This score might be updated as newer data is analyzed.
bucket_span: (number) The length of the bucket in seconds. This value matches the bucket_span that is specified in the job.
initial_anomaly_score: (number) The score between 0-100 for each bucket influencer. This score is the initial value that was calculated at the time the bucket was processed.
influencer_field_name: (string) The field name of the influencer. For example client_ip or user_name.
influencer_field_value: (string) The field value of the influencer. For example 192.168.88.2 or Bob.
is_interim: (boolean) If true, this is an interim result. In other words, the bucket influencer results are calculated based on partial input data.
job_id: (string) The unique identifier for the job that these results belong to.
probability: (number) The probability that the bucket has this behavior, in the range 0 to 1. For example, 0.0000109783. This value can be held to a high precision of over 300 decimal places, so the anomaly_score is provided as a human-readable and friendly interpretation of this.
raw_anomaly_score: (number) Internal.
result_type: (string) Internal. This value is always set to bucket_influencer.
timestamp: (date) The start time of the bucket for which these results were calculated.

Influencers

edit

Influencers are the entities that have contributed to, or are to blame for, the anomalies. Influencer results are available only if an influencer_field_name is specified in the job configuration.

Influencers are given an influencer_score, which is calculated based on the anomalies that have occurred in each bucket interval. For jobs with more than one detector, this gives a powerful view of the most anomalous entities.

For example, if you are analyzing unusual bytes sent and unusual domains visited and you specified user_name as the influencer, then an influencer_score for each anomalous user name is written per bucket. For example, if user_name: Bob had an influencer_score greater than 75, then Bob would be considered very anomalous during this time interval in one or both of those areas (unusual bytes sent or unusual domains visited).

One influencer result is written per bucket for each influencer that is considered anomalous.

When you identify an influencer with a high score, you can investigate further by accessing the records resource for that bucket and enumerating the anomaly records that contain the influencer.

An influencer object has the following properties:

bucket_span: (number) The length of the bucket in seconds. This value matches the bucket_span that is specified in the job.
influencer_score: (number) A normalized score between 0-100, which is based on the probability of the influencer in this bucket aggregated across detectors. Unlike initial_influencer_score, this value will be updated by a re-normalization process as new data is analyzed.
initial_influencer_score: (number) A normalized score between 0-100, which is based on the probability of the influencer aggregated across detectors. This is the initial value that was calculated at the time the bucket was processed.
influencer_field_name: (string) The field name of the influencer.
influencer_field_value: (string) The entity that influenced, contributed to, or was to blame for the anomaly.
is_interim: (boolean) If true, this is an interim result. In other words, the influencer results are calculated based on partial input data.
job_id: (string) The unique identifier for the job that these results belong to.
probability: (number) The probability that the influencer has this behavior, in the range 0 to 1. For example, 0.0000109783. This value can be held to a high precision of over 300 decimal places, so the influencer_score is provided as a human-readable and friendly interpretation of this.
result_type: (string) Internal. This value is always set to influencer.
timestamp: (date) The start time of the bucket for which these results were calculated.

Additional influencer properties are added, depending on the fields being analyzed. For example, if it’s analyzing user_name as an influencer, then a field user_name is added to the result document. This information enables you to filter the anomaly results more easily.

Records

edit

Records contain the detailed analytical results. They describe the anomalous activity that has been identified in the input data based on the detector configuration.

For example, if you are looking for unusually large data transfers, an anomaly record can identify the source IP address, the destination, the time window during which it occurred, the expected and actual size of the transfer, and the probability of this occurrence.

There can be many anomaly records depending on the characteristics and size of the input data. In practice, there are often too many to be able to manually process them. The machine learning features therefore perform a sophisticated aggregation of the anomaly records into buckets.

The number of record results depends on the number of anomalies found in each bucket, which relates to the number of time series being modeled and the number of detectors.

A record object has the following properties:

actual: (array) The actual value for the bucket.
bucket_span: (number) The length of the bucket in seconds. This value matches the bucket_span that is specified in the job.
by_field_name: (string) The name of the analyzed field. This value is present only if it is specified in the detector. For example, client_ip.
by_field_value: (string) The value of by_field_name. This value is present only if it is specified in the detector. For example, 192.168.66.2.
causes: (array) For population analysis, an over field must be specified in the detector. This property contains an array of anomaly records that are the causes for the anomaly that has been identified for the over field. If no over fields exist, this field is not present. This sub-resource contains the most anomalous records for the over_field_name. For scalability reasons, a maximum of the 10 most significant causes of the anomaly are returned. As part of the core analytical modeling, these low-level anomaly records are aggregated for their parent over field record. The causes resource contains similar elements to the record resource, namely actual, typical, *_field_name and *_field_value. Probability and scores are not applicable to causes.
detector_index: (number) A unique identifier for the detector.
field_name: (string) Certain functions require a field to operate on, for example, sum(). For those functions, this value is the name of the field to be analyzed.
function: (string) The function in which the anomaly occurs, as specified in the detector configuration. For example, max.
function_description: (string) The description of the function in which the anomaly occurs, as specified in the detector configuration.
influencers: (array) If influencers was specified in the detector configuration, then this array contains influencers that contributed to or were to blame for an anomaly.
initial_record_score: (number) A normalized score between 0-100, which is based on the probability of the anomalousness of this record. This is the initial value that was calculated at the time the bucket was processed.
is_interim: (boolean) If true, this is an interim result. In other words, the anomaly record is calculated based on partial input data.
job_id: (string) The unique identifier for the job that these results belong to.
over_field_name: (string) The name of the over field that was used in the analysis. This value is present only if it was specified in the detector. Over fields are used in population analysis. For example, user.
over_field_value: (string) The value of over_field_name. This value is present only if it was specified in the detector. For example, Bob.
partition_field_name: (string) The name of the partition field that was used in the analysis. This value is present only if it was specified in the detector. For example, region.
partition_field_value: (string) The value of partition_field_name. This value is present only if it was specified in the detector. For example, us-east-1.
probability: (number) The probability of the individual anomaly occurring, in the range 0 to 1. For example, 0.0000772031. This value can be held to a high precision of over 300 decimal places, so the record_score is provided as a human-readable and friendly interpretation of this.
multi_bucket_impact: (number) an indication of how strongly an anomaly is multi bucket or single bucket. The value is on a scale of -5 to +5 where -5 means the anomaly is purely single bucket and +5 means the anomaly is purely multi bucket.
record_score: (number) A normalized score between 0-100, which is based on the probability of the anomalousness of this record. Unlike initial_record_score, this value will be updated by a re-normalization process as new data is analyzed.
result_type: (string) Internal. This is always set to record.
timestamp: (date) The start time of the bucket for which these results were calculated.
typical: (array) The typical value for the bucket, according to analytical modeling.

Additional record properties are added, depending on the fields being analyzed. For example, if it’s analyzing hostname as a by field, then a field hostname is added to the result document. This information enables you to filter the anomaly results more easily.

Overall Buckets

edit

Overall buckets provide a summary of bucket results over multiple jobs. Their bucket_span equals the longest bucket_span of the jobs in question. The overall_score is the top_n average of the max anomaly_score per job within the overall bucket time interval. This means that you can fine-tune the overall_score so that it is more or less sensitive to the number of jobs that detect an anomaly at the same time.

An overall bucket resource has the following properties:

timestamp: (date) The start time of the overall bucket.
bucket_span: (number) The length of the bucket in seconds. Matches the bucket_span of the job with the longest one.
overall_score: (number) The top_n average of the max bucket anomaly_score per job.
jobs: (array) An array of objects that contain the max_anomaly_score per job_id.
is_interim: (boolean) If true, this is an interim result. In other words, the anomaly record is calculated based on partial input data.
result_type: (string) Internal. This is always set to overall_bucket.

« Role mapping resources Scheduled event resources »

On this page

Buckets
Bucket Influencers
Influencers
Records
Categories
Overall Buckets

Was this helpful?

Feedback

The Search AI Company

ELK Stack

Elastic Cloud

Generative AI

Search

Security

Observability

By solution

Industries

Customer spotlight

Research

Build

Learn

Connect

Results resources

Results resources

Buckets

Bucket Influencers

Influencers

Records

Categories

Overall Buckets

Follow us

About us

Join us

Partners

Trust & Security

Investor relations

Excellence Awards

About us

Join us

Partners

Trust & Security

Investor relations

Excellence Awards