Cumulative Cardinality Aggregation
editCumulative Cardinality Aggregation
editA parent pipeline aggregation which calculates the Cumulative Cardinality in a parent histogram (or date_histogram)
aggregation. The specified metric must be a cardinality aggregation and the enclosing histogram
must have min_doc_count
set to 0
(default for histogram
aggregations).
The cumulative_cardinality
agg is useful for finding "total new items", like the number of new visitors to your
website each day. A regular cardinality aggregation will tell you how many unique visitors came each day, but doesn’t
differentiate between "new" or "repeat" visitors. The Cumulative Cardinality aggregation can be used to determine
how many of each day’s unique visitors are "new".
Syntax
editA cumulative_cardinality
aggregation looks like this in isolation:
{ "cumulative_cardinality": { "buckets_path": "my_cardinality_agg" } }
Table 14. cumulative_cardinality
Parameters
Parameter Name | Description | Required | Default Value |
---|---|---|---|
|
The path to the cardinality aggregation we wish to find the cumulative cardinality for (see |
Required |
|
|
format to apply to the output value of this aggregation |
Optional |
|
The following snippet calculates the cumulative cardinality of the total daily users
:
GET /user_hits/_search { "size": 0, "aggs" : { "users_per_day" : { "date_histogram" : { "field" : "timestamp", "calendar_interval" : "day" }, "aggs": { "distinct_users": { "cardinality": { "field": "user_id" } }, "total_new_users": { "cumulative_cardinality": { "buckets_path": "distinct_users" } } } } } }
|
And the following may be the response:
{ "took": 11, "timed_out": false, "_shards": ..., "hits": ..., "aggregations": { "users_per_day": { "buckets": [ { "key_as_string": "2019-01-01T00:00:00.000Z", "key": 1546300800000, "doc_count": 2, "distinct_users": { "value": 2 }, "total_new_users": { "value": 2 } }, { "key_as_string": "2019-01-02T00:00:00.000Z", "key": 1546387200000, "doc_count": 2, "distinct_users": { "value": 2 }, "total_new_users": { "value": 3 } }, { "key_as_string": "2019-01-03T00:00:00.000Z", "key": 1546473600000, "doc_count": 3, "distinct_users": { "value": 3 }, "total_new_users": { "value": 4 } } ] } } }
Note how the second day, 2019-01-02
, has two distinct users but the total_new_users
metric generated by the
cumulative pipeline agg only increments to three. This means that only one of the two users that day were
new, the other had already been seen in the previous day. This happens again on the third day, where only
one of three users is completely new.
Incremental cumulative cardinality
editThe cumulative_cardinality
agg will show you the total, distinct count since the beginning of the time period
being queried. Sometimes, however, it is useful to see the "incremental" count. Meaning, how many new users
are added each day, rather than the total cumulative count.
This can be accomplished by adding a derivative
aggregation to our query:
GET /user_hits/_search { "size": 0, "aggs" : { "users_per_day" : { "date_histogram" : { "field" : "timestamp", "calendar_interval" : "day" }, "aggs": { "distinct_users": { "cardinality": { "field": "user_id" } }, "total_new_users": { "cumulative_cardinality": { "buckets_path": "distinct_users" } }, "incremental_new_users": { "derivative": { "buckets_path": "total_new_users" } } } } } }
And the following may be the response:
{ "took": 11, "timed_out": false, "_shards": ..., "hits": ..., "aggregations": { "users_per_day": { "buckets": [ { "key_as_string": "2019-01-01T00:00:00.000Z", "key": 1546300800000, "doc_count": 2, "distinct_users": { "value": 2 }, "total_new_users": { "value": 2 } }, { "key_as_string": "2019-01-02T00:00:00.000Z", "key": 1546387200000, "doc_count": 2, "distinct_users": { "value": 2 }, "total_new_users": { "value": 3 }, "incremental_new_users": { "value": 1.0 } }, { "key_as_string": "2019-01-03T00:00:00.000Z", "key": 1546473600000, "doc_count": 3, "distinct_users": { "value": 3 }, "total_new_users": { "value": 4 }, "incremental_new_users": { "value": 1.0 } } ] } } }