This documentation contains work-in-progress information for future Elastic Stack and Cloud releases. Use the version selector to view supported release docs. It also contains some Elastic Cloud serverless information. Check out our serverless docs for more details.

« Change point aggregation Cumulative sum aggregation »

› › ›

Cumulative cardinality aggregation

edit

Cumulative cardinality aggregation

edit

A parent pipeline aggregation which calculates the Cumulative Cardinality in a parent histogram (or date_histogram) aggregation. The specified metric must be a cardinality aggregation and the enclosing histogram must have min_doc_count set to 0 (default for histogram aggregations).

The cumulative_cardinality agg is useful for finding "total new items", like the number of new visitors to your website each day. A regular cardinality aggregation will tell you how many unique visitors came each day, but doesn’t differentiate between "new" or "repeat" visitors. The Cumulative Cardinality aggregation can be used to determine how many of each day’s unique visitors are "new".

Syntax

edit

A cumulative_cardinality aggregation looks like this in isolation:

{
  "cumulative_cardinality": {
    "buckets_path": "my_cardinality_agg"
  }
}

Table 60. cumulative_cardinality Parameters

Parameter Name	Description	Required	Default Value
`buckets_path`	The path to the cardinality aggregation we wish to find the cumulative cardinality for (see `buckets_path` Syntax for more details)	Required
`format`	DecimalFormat pattern for the output value. If specified, the formatted value is returned in the aggregation’s `value_as_string` property	Optional	`null`

The following snippet calculates the cumulative cardinality of the total daily users:

resp = client.search(
    index="user_hits",
    size=0,
    aggs={
        "users_per_day": {
            "date_histogram": {
                "field": "timestamp",
                "calendar_interval": "day"
            },
            "aggs": {
                "distinct_users": {
                    "cardinality": {
                        "field": "user_id"
                    }
                },
                "total_new_users": {
                    "cumulative_cardinality": {
                        "buckets_path": "distinct_users"
                    }
                }
            }
        }
    },
)
print(resp)

response = client.search(
  index: 'user_hits',
  body: {
    size: 0,
    aggregations: {
      users_per_day: {
        date_histogram: {
          field: 'timestamp',
          calendar_interval: 'day'
        },
        aggregations: {
          distinct_users: {
            cardinality: {
              field: 'user_id'
            }
          },
          total_new_users: {
            cumulative_cardinality: {
              buckets_path: 'distinct_users'
            }
          }
        }
      }
    }
  }
)
puts response

const response = await client.search({
  index: "user_hits",
  size: 0,
  aggs: {
    users_per_day: {
      date_histogram: {
        field: "timestamp",
        calendar_interval: "day",
      },
      aggs: {
        distinct_users: {
          cardinality: {
            field: "user_id",
          },
        },
        total_new_users: {
          cumulative_cardinality: {
            buckets_path: "distinct_users",
          },
        },
      },
    },
  },
});
console.log(response);

GET /user_hits/_search
{
  "size": 0,
  "aggs": {
    "users_per_day": {
      "date_histogram": {
        "field": "timestamp",
        "calendar_interval": "day"
      },
      "aggs": {
        "distinct_users": {
          "cardinality": {
            "field": "user_id"
          }
        },
        "total_new_users": {
          "cumulative_cardinality": {
            "buckets_path": "distinct_users" 
          }
        }
      }
    }
  }
}

buckets_path instructs this aggregation to use the output of the distinct_users aggregation for the cumulative cardinality

And the following may be the response:

{
   "took": 11,
   "timed_out": false,
   "_shards": ...,
   "hits": ...,
   "aggregations": {
      "users_per_day": {
         "buckets": [
            {
               "key_as_string": "2019-01-01T00:00:00.000Z",
               "key": 1546300800000,
               "doc_count": 2,
               "distinct_users": {
                  "value": 2
               },
               "total_new_users": {
                  "value": 2
               }
            },
            {
               "key_as_string": "2019-01-02T00:00:00.000Z",
               "key": 1546387200000,
               "doc_count": 2,
               "distinct_users": {
                  "value": 2
               },
               "total_new_users": {
                  "value": 3
               }
            },
            {
               "key_as_string": "2019-01-03T00:00:00.000Z",
               "key": 1546473600000,
               "doc_count": 3,
               "distinct_users": {
                  "value": 3
               },
               "total_new_users": {
                  "value": 4
               }
            }
         ]
      }
   }
}

Note how the second day, 2019-01-02, has two distinct users but the total_new_users metric generated by the cumulative pipeline agg only increments to three. This means that only one of the two users that day were new, the other had already been seen in the previous day. This happens again on the third day, where only one of three users is completely new.

Incremental cumulative cardinality

edit

The cumulative_cardinality agg will show you the total, distinct count since the beginning of the time period being queried. Sometimes, however, it is useful to see the "incremental" count. Meaning, how many new users are added each day, rather than the total cumulative count.

This can be accomplished by adding a derivative aggregation to our query:

resp = client.search(
    index="user_hits",
    size=0,
    aggs={
        "users_per_day": {
            "date_histogram": {
                "field": "timestamp",
                "calendar_interval": "day"
            },
            "aggs": {
                "distinct_users": {
                    "cardinality": {
                        "field": "user_id"
                    }
                },
                "total_new_users": {
                    "cumulative_cardinality": {
                        "buckets_path": "distinct_users"
                    }
                },
                "incremental_new_users": {
                    "derivative": {
                        "buckets_path": "total_new_users"
                    }
                }
            }
        }
    },
)
print(resp)

response = client.search(
  index: 'user_hits',
  body: {
    size: 0,
    aggregations: {
      users_per_day: {
        date_histogram: {
          field: 'timestamp',
          calendar_interval: 'day'
        },
        aggregations: {
          distinct_users: {
            cardinality: {
              field: 'user_id'
            }
          },
          total_new_users: {
            cumulative_cardinality: {
              buckets_path: 'distinct_users'
            }
          },
          incremental_new_users: {
            derivative: {
              buckets_path: 'total_new_users'
            }
          }
        }
      }
    }
  }
)
puts response

const response = await client.search({
  index: "user_hits",
  size: 0,
  aggs: {
    users_per_day: {
      date_histogram: {
        field: "timestamp",
        calendar_interval: "day",
      },
      aggs: {
        distinct_users: {
          cardinality: {
            field: "user_id",
          },
        },
        total_new_users: {
          cumulative_cardinality: {
            buckets_path: "distinct_users",
          },
        },
        incremental_new_users: {
          derivative: {
            buckets_path: "total_new_users",
          },
        },
      },
    },
  },
});
console.log(response);

GET /user_hits/_search
{
  "size": 0,
  "aggs": {
    "users_per_day": {
      "date_histogram": {
        "field": "timestamp",
        "calendar_interval": "day"
      },
      "aggs": {
        "distinct_users": {
          "cardinality": {
            "field": "user_id"
          }
        },
        "total_new_users": {
          "cumulative_cardinality": {
            "buckets_path": "distinct_users"
          }
        },
        "incremental_new_users": {
          "derivative": {
            "buckets_path": "total_new_users"
          }
        }
      }
    }
  }
}

And the following may be the response:

{
   "took": 11,
   "timed_out": false,
   "_shards": ...,
   "hits": ...,
   "aggregations": {
      "users_per_day": {
         "buckets": [
            {
               "key_as_string": "2019-01-01T00:00:00.000Z",
               "key": 1546300800000,
               "doc_count": 2,
               "distinct_users": {
                  "value": 2
               },
               "total_new_users": {
                  "value": 2
               }
            },
            {
               "key_as_string": "2019-01-02T00:00:00.000Z",
               "key": 1546387200000,
               "doc_count": 2,
               "distinct_users": {
                  "value": 2
               },
               "total_new_users": {
                  "value": 3
               },
               "incremental_new_users": {
                  "value": 1.0
               }
            },
            {
               "key_as_string": "2019-01-03T00:00:00.000Z",
               "key": 1546473600000,
               "doc_count": 3,
               "distinct_users": {
                  "value": 3
               },
               "total_new_users": {
                  "value": 4
               },
               "incremental_new_users": {
                  "value": 1.0
               }
            }
         ]
      }
   }
}

« Change point aggregation Cumulative sum aggregation »