- Elasticsearch Guide: other versions:
- Getting Started
- Setup
- Breaking changes
- API Conventions
- Document APIs
- Search APIs
- Search
- URI Search
- Request Body Search
- Search Template
- Search Shards API
- Aggregations
- Min Aggregation
- Max Aggregation
- Sum Aggregation
- Avg Aggregation
- Stats Aggregation
- Extended Stats Aggregation
- Value Count Aggregation
- Percentiles Aggregation
- Percentile Ranks Aggregation
- Cardinality Aggregation
- Geo Bounds Aggregation
- Top hits Aggregation
- Scripted Metric Aggregation
- Global Aggregation
- Filter Aggregation
- Filters Aggregation
- Missing Aggregation
- Nested Aggregation
- Reverse nested Aggregation
- Children Aggregation
- Terms Aggregation
- Significant Terms Aggregation
- Range Aggregation
- Date Range Aggregation
- IPv4 Range Aggregation
- Histogram Aggregation
- Date Histogram Aggregation
- Geo Distance Aggregation
- GeoHash grid Aggregation
- Facets
- Suggesters
- Multi Search API
- Count API
- Search Exists API
- Validate API
- Explain API
- Percolator
- More Like This API
- Field stats API
- Indices APIs
- Create Index
- Delete Index
- Get Index
- Indices Exists
- Open / Close Index API
- Put Mapping
- Get Mapping
- Get Field Mapping
- Types Exists
- Delete Mapping
- Index Aliases
- Update Indices Settings
- Get Settings
- Analyze
- Index Templates
- Warmers
- Status
- Indices Stats
- Indices Segments
- Indices Recovery
- Clear Cache
- Flush
- Refresh
- Optimize
- Shadow replica indices
- Upgrade
- cat APIs
- Cluster APIs
- Query DSL
- Queries
- Match Query
- Multi Match Query
- Bool Query
- Boosting Query
- Common Terms Query
- Constant Score Query
- Dis Max Query
- Filtered Query
- Fuzzy Like This Query
- Fuzzy Like This Field Query
- Function Score Query
- Fuzzy Query
- GeoShape Query
- Has Child Query
- Has Parent Query
- Ids Query
- Indices Query
- Match All Query
- More Like This Query
- Nested Query
- Prefix Query
- Query String Query
- Simple Query String Query
- Range Query
- Regexp Query
- Span First Query
- Span Multi Term Query
- Span Near Query
- Span Not Query
- Span Or Query
- Span Term Query
- Term Query
- Terms Query
- Top Children Query
- Wildcard Query
- Minimum Should Match
- Multi Term Query Rewrite
- Template Query
- Filters
- And Filter
- Bool Filter
- Exists Filter
- Geo Bounding Box Filter
- Geo Distance Filter
- Geo Distance Range Filter
- Geo Polygon Filter
- GeoShape Filter
- Geohash Cell Filter
- Has Child Filter
- Has Parent Filter
- Ids Filter
- Indices Filter
- Limit Filter
- Match All Filter
- Missing Filter
- Nested Filter
- Not Filter
- Or Filter
- Prefix Filter
- Query Filter
- Range Filter
- Regexp Filter
- Script Filter
- Term Filter
- Terms Filter
- Type Filter
- Queries
- Mapping
- Analysis
- Analyzers
- Tokenizers
- Token Filters
- Standard Token Filter
- ASCII Folding Token Filter
- Length Token Filter
- Lowercase Token Filter
- Uppercase Token Filter
- NGram Token Filter
- Edge NGram Token Filter
- Porter Stem Token Filter
- Shingle Token Filter
- Stop Token Filter
- Word Delimiter Token Filter
- Stemmer Token Filter
- Stemmer Override Token Filter
- Keyword Marker Token Filter
- Keyword Repeat Token Filter
- KStem Token Filter
- Snowball Token Filter
- Phonetic Token Filter
- Synonym Token Filter
- Compound Word Token Filter
- Reverse Token Filter
- Elision Token Filter
- Truncate Token Filter
- Unique Token Filter
- Pattern Capture Token Filter
- Pattern Replace Token Filter
- Trim Token Filter
- Limit Token Count Token Filter
- Hunspell Token Filter
- Common Grams Token Filter
- Normalization Token Filter
- CJK Width Token Filter
- CJK Bigram Token Filter
- Delimited Payload Token Filter
- Keep Words Token Filter
- Keep Types Token Filter
- Classic Token Filter
- Apostrophe Token Filter
- Character Filters
- ICU Analysis Plugin
- Modules
- Index Modules
- Testing
- Glossary of terms
WARNING: Version 1.7 of Elasticsearch has passed its EOL date.
This documentation is no longer being maintained and may be removed. If you are running this version, we strongly advise you to upgrade. For the latest information, see the current release documentation.
Aggregations
editAggregations
editAggregations grew out of the facets module and the long experience of how users use it (and would like to use it) for real-time data analytics purposes. As such, it serves as the next generation replacement for the functionality we currently refer to as "faceting".
Facets provide a great way to aggregate data within a document set context. This context is defined by the executed query in combination with the different levels of filters that can be defined (filtered queries, top-level filters, and facet level filters). While powerful, their implementation is not designed from the ground up to support complex aggregations and is thus limited.
The aggregations module breaks the barriers the current facet implementation put in place. The new name ("Aggregations") also indicates the intention here - a generic yet extremely powerful framework for building aggregations - any types of aggregations.
An aggregation can be seen as a unit-of-work that builds analytic information over a set of documents. The context of the execution defines what this document set is (e.g. a top-level aggregation executes within the context of the executed query/filters of the search request).
There are many different types of aggregations, each with its own purpose and output. To better understand these types, it is often easier to break them into two main families:
- Bucketing
- A family of aggregations that build buckets, where each bucket is associated with a key and a document criterion. When the aggregation is executed, all the buckets criteria are evaluated on every document in the context and when a criterion matches, the document is considered to "fall in" the relevant bucket. By the end of the aggregation process, we’ll end up with a list of buckets - each one with a set of documents that "belong" to it.
- Metric
- Aggregations that keep track and compute metrics over a set of documents.
The interesting part comes next. Since each bucket effectively defines a document set (all documents belonging to the bucket), one can potentially associate aggregations on the bucket level, and those will execute within the context of that bucket. This is where the real power of aggregations kicks in: aggregations can be nested!
Bucketing aggregations can have sub-aggregations (bucketing or metric). The sub-aggregations will be computed for the buckets which their parent aggregation generates. There is no hard limit on the level/depth of nested aggregations (one can nest an aggregation under a "parent" aggregation, which is itself a sub-aggregation of another higher-level aggregation).
Structuring Aggregations
editThe following snippet captures the basic structure of aggregations:
"aggregations" : { "<aggregation_name>" : { "<aggregation_type>" : { <aggregation_body> } [,"aggregations" : { [<sub_aggregation>]+ } ]? } [,"<aggregation_name_2>" : { ... } ]* }
The aggregations
object (the key aggs
can also be used) in the JSON holds the aggregations to be computed. Each aggregation
is associated with a logical name that the user defines (e.g. if the aggregation computes the average price, then it would
make sense to name it avg_price
). These logical names will also be used to uniquely identify the aggregations in the
response. Each aggregation has a specific type (<aggregation_type>
in the above snippet) and is typically the first
key within the named aggregation body. Each type of aggregation defines its own body, depending on the nature of the
aggregation (e.g. an avg
aggregation on a specific field will define the field on which the average will be calculated).
At the same level of the aggregation type definition, one can optionally define a set of additional aggregations,
though this only makes sense if the aggregation you defined is of a bucketing nature. In this scenario, the
sub-aggregations you define on the bucketing aggregation level will be computed for all the buckets built by the
bucketing aggregation. For example, if you define a set of aggregations under the range
aggregation, the
sub-aggregations will be computed for the range buckets that are defined.
Values Source
editSome aggregations work on values extracted from the aggregated documents. Typically, the values will be extracted from
a specific document field which is set using the field
key for the aggregations. It is also possible to define a
script
which will generate the values (per document).
The script
parameter expects an inline script. Use script_id
for indexed scripts and script_file
for scripts in the config/scripts/
directory.
When both field
and script
settings are configured for the aggregation, the script will be treated as a
value script
. While normal scripts are evaluated on a document level (i.e. the script has access to all the data
associated with the document), value scripts are evaluated on the value level. In this mode, the values are extracted
from the configured field
and the script
is used to apply a "transformation" over these value/s.
When working with scripts, the lang
and params
settings can also be defined. The former defines the scripting
language which is used (assuming the proper language is available in Elasticsearch, either by default or as a plugin). The latter
enables defining all the "dynamic" expressions in the script as parameters, which enables the script to keep itself static
between calls (this will ensure the use of the cached compiled scripts in Elasticsearch).
Scripts can generate a single value or multiple values per document. When generating multiple values, one can use the
script_values_sorted
settings to indicate whether these values are sorted or not. Internally, Elasticsearch can
perform optimizations when dealing with sorted values (for example, with the min
aggregations, knowing the values are
sorted, Elasticsearch will skip the iterations over all the values and rely on the first value in the list to be the
minimum value among all other values associated with the same document).
Metrics Aggregations
editThe aggregations in this family compute metrics based on values extracted in one way or another from the documents that are being aggregated. The values are typically extracted from the fields of the document (using the field data), but can also be generated using scripts.
Numeric metrics aggregations are a special type of metrics aggregation which output numeric values. Some aggregations output
a single numeric metric (e.g. avg
) and are called single-value numeric metrics aggregation
, others generate multiple
metrics (e.g. stats
) and are called multi-value numeric metrics aggregation
. The distinction between single-value and
multi-value numeric metrics aggregations plays a role when these aggregations serve as direct sub-aggregations of some
bucket aggregations (some bucket aggregations enable you to sort the returned buckets based on the numeric metrics in each bucket).
Bucket Aggregations
editBucket aggregations don’t calculate metrics over fields like the metrics aggregations do, but instead, they create
buckets of documents. Each bucket is associated with a criterion (depending on the aggregation type) which determines
whether or not a document in the current context "falls" into it. In other words, the buckets effectively define document
sets. In addition to the buckets themselves, the bucket
aggregations also compute and return the number of documents
that "fell in" to each bucket.
Bucket aggregations, as opposed to metrics
aggregations, can hold sub-aggregations. These sub-aggregations will be
aggregated for the buckets created by their "parent" bucket aggregation.
There are different bucket aggregators, each with a different "bucketing" strategy. Some define a single bucket, some define fixed number of multiple buckets, and others dynamically create the buckets during the aggregation process.
Caching heavy aggregations
editFrequently used aggregations (e.g. for display on the home page of a website) can be cached for faster responses. These cached results are the same results that would be returned by an uncached aggregation — you will never get stale results.
See Shard query cache for more details.
Returning only aggregation results
editThere are many occasions when aggregations are required but search hits are not. For these cases the hits can be ignored by
adding search_type=count
to the request URL parameters. For example:
$ curl -XGET 'http://localhost:9200/twitter/tweet/_search?search_type=count' -d '{ "aggregations": { "my_agg": { "terms": { "field": "text" } } } } '
Setting search_type
to count
avoids executing the fetch phase of the search making the request more efficient. See
Search Type for more information on the search_type
parameter.
On this page