WARNING: Version 1.5 of Elasticsearch has passed its EOL date.
This documentation is no longer being maintained and may be removed. If you are running this version, we strongly advise you to upgrade. For the latest information, see the current release documentation.
Date Histogram Aggregation
editDate Histogram Aggregation
editA multi-bucket aggregation similar to the histogram except it can
only be applied on date values. Since dates are represented in elasticsearch internally as long values, it is possible
to use the normal histogram
on dates as well, though accuracy will be compromised. The reason for this is in the fact
that time based intervals are not fixed (think of leap years and on the number of days in a month). For this reason,
we need special support for time based data. From a functionality perspective, this histogram supports the same features
as the normal histogram. The main difference is that the interval can be specified by date/time expressions.
Requesting bucket intervals of a month.
{ "aggs" : { "articles_over_time" : { "date_histogram" : { "field" : "date", "interval" : "month" } } } }
Available expressions for interval: year
, quarter
, month
, week
, day
, hour
, minute
, second
Fractional values are allowed for seconds, minutes, hours, days and weeks. For example 1.5 hours:
{ "aggs" : { "articles_over_time" : { "date_histogram" : { "field" : "date", "interval" : "1.5h" } } } }
See Time units for accepted abbreviations.
Time Zone
editBy default, times are stored as UTC milliseconds since the epoch. Thus, all computation and "bucketing" / "rounding" is done on UTC. It is possible to provide a time zone value, which will cause all computations to take the relevant zone into account. The time returned for each bucket/entry is milliseconds since the epoch of the provided time zone.
Deprecated in 1.5.0.
pre_zone
, post_zone
are replaced by time_zone
The parameters are pre_zone
(pre rounding based on interval) and post_zone
(post rounding based on interval). The
time_zone
parameter simply sets the pre_zone
parameter. By default, those are set to UTC
.
The zone value accepts either a numeric value for the hours offset, for example: "time_zone" : -2
. It also accepts a
format of hours and minutes, like "time_zone" : "-02:30"
. Another option is to provide a time zone accepted as one of
the values listed here.
Lets take an example. For 2012-04-01T04:15:30Z
, with a pre_zone
of -08:00
. For day interval, the actual time by
applying the time zone and rounding falls under 2012-03-31
, so the returned value will be (in millis) of
2012-03-31T00:00:00Z
(UTC). For hour interval, applying the time zone results in 2012-03-31T20:15:30
, rounding it
results in 2012-03-31T20:00:00
, but, we want to return it in UTC (post_zone
is not set), so we convert it back to
UTC: 2012-04-01T04:00:00Z
. Note, we are consistent in the results, returning the rounded value in UTC.
post_zone
simply takes the result, and adds the relevant offset.
Deprecated in 1.5.0.
pre_zone_adjust_large_interval
will be removed
Sometimes, we want to apply the same conversion to UTC we did above for hour also for day (and up) intervals. We can
set pre_zone_adjust_large_interval
to true
, which will apply the same conversion done for hour interval in the
example, to day and above intervals (it can be set regardless of the interval, but only kick in when using day and
higher intervals).
Offset
editThe offset
option can be provided for shifting the date bucket intervals boundaries after any other shifts because of
time zones are applies. This for example makes it possible that daily buckets go from 6AM to 6AM the next day instead of starting at 12AM
or that monthly buckets go from the 10th of the month to the 10th of the next month instead of the 1st.
The offset
option accepts positive or negative time durations like "1h" for an hour or "1M" for a Month. See Time units for more
possible time duration options.
Deprecated in 1.5.0.
pre_offset
and post_offset
are deprecated and replaced by offset
Keys
editSince internally, dates are represented as 64bit numbers, these numbers are returned as the bucket keys (each key representing a date - milliseconds since the epoch). It is also possible to define a date format, which will result in returning the dates as formatted strings next to the numeric key values:
{ "aggs" : { "articles_over_time" : { "date_histogram" : { "field" : "date", "interval" : "1M", "format" : "yyyy-MM-dd" } } } }
Supports expressive date format pattern |
Response:
{ "aggregations": { "articles_over_time": { "buckets": [ { "key_as_string": "2013-02-02", "key": 1328140800000, "doc_count": 1 }, { "key_as_string": "2013-03-02", "key": 1330646400000, "doc_count": 2 }, ... ] } } }
Like with the normal histogram, both document level scripts and
value level scripts are supported. It is also possible to control the order of the returned buckets using the order
settings and filter the returned buckets based on a min_doc_count
setting (by default all buckets with
min_doc_count > 0
will be returned). This histogram also supports the extended_bounds
setting, which enables extending
the bounds of the histogram beyond the data itself (to read more on why you’d want to do that please refer to the
explanation here).