Matrix stats aggregation
editMatrix stats aggregation
editThe matrix_stats
aggregation is a numeric aggregation that computes the following statistics over a set of document fields:
|
Number of per field samples included in the calculation. |
|
The average value for each field. |
|
Per field Measurement for how spread out the samples are from the mean. |
|
Per field measurement quantifying the asymmetric distribution around the mean. |
|
Per field measurement quantifying the shape of the distribution. |
|
A matrix that quantitatively describes how changes in one field are associated with another. |
|
The covariance matrix scaled to a range of -1 to 1, inclusive. Describes the relationship between field distributions. |
Unlike other metric aggregations, the matrix_stats
aggregation does
not support scripting.
The following example demonstrates the use of matrix stats to describe the relationship between income and poverty.
GET /_search { "aggs": { "statistics": { "matrix_stats": { "fields": [ "poverty", "income" ] } } } }
The aggregation type is matrix_stats
and the fields
setting defines the set of fields (as an array) for computing
the statistics. The above request returns the following response:
{ ... "aggregations": { "statistics": { "doc_count": 50, "fields": [ { "name": "income", "count": 50, "mean": 51985.1, "variance": 7.383377037755103E7, "skewness": 0.5595114003506483, "kurtosis": 2.5692365287787124, "covariance": { "income": 7.383377037755103E7, "poverty": -21093.65836734694 }, "correlation": { "income": 1.0, "poverty": -0.8352655256272504 } }, { "name": "poverty", "count": 50, "mean": 12.732000000000001, "variance": 8.637730612244896, "skewness": 0.4516049811903419, "kurtosis": 2.8615929677997767, "covariance": { "income": -21093.65836734694, "poverty": 8.637730612244896 }, "correlation": { "income": -0.8352655256272504, "poverty": 1.0 } } ] } } }
The doc_count
field indicates the number of documents involved in the computation of the statistics.
Multi Value Fields
editThe matrix_stats
aggregation treats each document field as an independent sample. The mode
parameter controls what
array value the aggregation will use for array or multi-valued fields. This parameter can take one of the following:
|
(default) Use the average of all values. |
|
Pick the lowest value. |
|
Pick the highest value. |
|
Use the sum of all values. |
|
Use the median of all values. |
Missing Values
editThe missing
parameter defines how documents that are missing a value should be treated.
By default they will be ignored but it is also possible to treat them as if they had a value.
This is done by adding a set of fieldname : value mappings to specify default values per field.