Grouping Limitations with heterogeneous indices
editGrouping Limitations with heterogeneous indices
editThere is a known limitation to Rollup groups, due to some internal implementation details at this time. The Rollup feature leverages
the composite
aggregation from Elasticsearch. At the moment, the composite agg only returns buckets when all keys in the tuple are non-null.
Put another way, if the you request keys [A,B,C]
in the composite aggregation, the only documents that are aggregated are those that have
all of the keys A, B
and C
.
Because Rollup uses the composite agg during the indexing process, it inherits this behavior. Practically speaking, if all of the documents in your index are homogeneous (they have the same mapping), you can ignore this limitation and stop reading now.
However, if you have a heterogeneous collection of documents that you wish to roll up, you may need to configure two or more jobs to accurately cover the original data.
As an example, if your index has two types of documents:
{ "timestamp": 1516729294000, "temperature": 200, "voltage": 5.2, "node": "a" }
and
{ "timestamp": 1516729294000, "price": 123, "title": "Foo" }
it may be tempting to create a single, combined rollup job which covers both of these document types, something like this:
PUT _xpack/rollup/job/combined { "index_pattern": "data-*", "rollup_index": "data_rollup", "cron": "*/30 * * * * ?", "page_size" :1000, "groups" : { "date_histogram": { "field": "timestamp", "interval": "1h", "delay": "7d" }, "terms": { "fields": ["node", "title"] } }, "metrics": [ { "field": "temperature", "metrics": ["min", "max", "sum"] }, { "field": "price", "metrics": ["avg"] } ] }
You can see that it includes a terms
grouping on both "node" and "title", fields that are mutually exclusive in the document types.
This will not work. Because the composite
aggregation (and by extension, Rollup) only returns buckets when all keys are non-null,
and there are no documents that have both a "node" field and a "title" field, this rollup job will not produce any rollups.
Instead, you should configure two independent jobs (sharing the same index, or going to separate indices):
PUT _xpack/rollup/job/sensor { "index_pattern": "data-*", "rollup_index": "data_rollup", "cron": "*/30 * * * * ?", "page_size" :1000, "groups" : { "date_histogram": { "field": "timestamp", "interval": "1h", "delay": "7d" }, "terms": { "fields": ["node"] } }, "metrics": [ { "field": "temperature", "metrics": ["min", "max", "sum"] } ] }
PUT _xpack/rollup/job/purchases { "index_pattern": "data-*", "rollup_index": "data_rollup", "cron": "*/30 * * * * ?", "page_size" :1000, "groups" : { "date_histogram": { "field": "timestamp", "interval": "1h", "delay": "7d" }, "terms": { "fields": ["title"] } }, "metrics": [ { "field": "price", "metrics": ["avg"] } ] }
Notice that each job now deals with a single "document type", and will not run into the limitations described above. We are working on changes
in core Elasticsearch to remove this limitation from the composite
aggregation, and the documentation will be updated accordingly
when this particular scenario is fixed.