Normalize aggregation

edit

A parent pipeline aggregation which calculates the specific normalized/rescaled value for a specific bucket value. Values that cannot be normalized, will be skipped using the skip gap policy.

Syntax

edit

A normalize aggregation looks like this in isolation:

{
  "normalize": {
    "buckets_path": "normalized",
    "method": "percent_of_sum"
  }
}

Table 72. normalize_pipeline Parameters

Parameter Name Description Required Default Value

buckets_path

The path to the buckets we wish to normalize (see buckets_path syntax for more details)

Required

method

The specific method to apply

Required

format

DecimalFormat pattern for the output value. If specified, the formatted value is returned in the aggregation’s value_as_string property

Optional

null

Methods

edit

The Normalize Aggregation supports multiple methods to transform the bucket values. Each method definition will use the following original set of bucket values as examples: [5, 5, 10, 50, 10, 20].

rescale_0_1

This method rescales the data such that the minimum number is zero, and the maximum number is 1, with the rest normalized linearly in-between.

x' = (x - min_x) / (max_x - min_x)
[0, 0, .1111, 1, .1111, .3333]
rescale_0_100

This method rescales the data such that the minimum number is zero, and the maximum number is 100, with the rest normalized linearly in-between.

x' = 100 * (x - min_x) / (max_x - min_x)
[0, 0, 11.11, 100, 11.11, 33.33]
percent_of_sum

This method normalizes each value so that it represents a percentage of the total sum it attributes to.

x' = x / sum_x
[5%, 5%, 10%, 50%, 10%, 20%]
mean

This method normalizes such that each value is normalized by how much it differs from the average.

x' = (x - mean_x) / (max_x - min_x)
[4.63, 4.63, 9.63, 49.63, 9.63, 9.63, 19.63]
z-score

This method normalizes such that each value represents how far it is from the mean relative to the standard deviation

x' = (x - mean_x) / stdev_x
[-0.68, -0.68, -0.39, 1.94, -0.39, 0.19]
softmax

This method normalizes such that each value is exponentiated and relative to the sum of the exponents of the original values.

x' = e^x / sum_e_x
[2.862E-20, 2.862E-20, 4.248E-18, 0.999, 9.357E-14, 4.248E-18]

Example

edit

The following snippet calculates the percent of total sales for each month:

POST /sales/_search
{
  "size": 0,
  "aggs": {
    "sales_per_month": {
      "date_histogram": {
        "field": "date",
        "calendar_interval": "month"
      },
      "aggs": {
        "sales": {
          "sum": {
            "field": "price"
          }
        },
        "percent_of_total_sales": {
          "normalize": {
            "buckets_path": "sales",          
            "method": "percent_of_sum",       
            "format": "00.00%"                
          }
        }
      }
    }
  }
}

buckets_path instructs this normalize aggregation to use the output of the sales aggregation for rescaling

method sets which rescaling to apply. In this case, percent_of_sum will calculate the sales value as a percent of all sales in the parent bucket

format influences how to format the metric as a string using Java’s DecimalFormat pattern. In this case, multiplying by 100 and adding a %

And the following may be the response:

{
   "took": 11,
   "timed_out": false,
   "_shards": ...,
   "hits": ...,
   "aggregations": {
      "sales_per_month": {
         "buckets": [
            {
               "key_as_string": "2015/01/01 00:00:00",
               "key": 1420070400000,
               "doc_count": 3,
               "sales": {
                  "value": 550.0
               },
               "percent_of_total_sales": {
                  "value": 0.5583756345177665,
                  "value_as_string": "55.84%"
               }
            },
            {
               "key_as_string": "2015/02/01 00:00:00",
               "key": 1422748800000,
               "doc_count": 2,
               "sales": {
                  "value": 60.0
               },
               "percent_of_total_sales": {
                  "value": 0.06091370558375635,
                  "value_as_string": "06.09%"
               }
            },
            {
               "key_as_string": "2015/03/01 00:00:00",
               "key": 1425168000000,
               "doc_count": 2,
               "sales": {
                  "value": 375.0
               },
               "percent_of_total_sales": {
                  "value": 0.38071065989847713,
                  "value_as_string": "38.07%"
               }
            }
         ]
      }
   }
}