Aggregate Functions
editAggregate Functions
editFunctions for computing a single result from a set of input values. Elasticsearch SQL supports aggregate functions only alongside grouping (implicit or explicit).
General Purpose
editAVG
editSynopsis:
Input:
Output: double
numeric value
Description: Returns the Average (arithmetic mean) of input values.
SELECT AVG(salary) AS avg FROM emp; avg --------------- 48248.55
SELECT AVG(salary / 12.0) AS avg FROM emp; avg --------------- 4020.7125
COUNT
editSynopsis:
Input:
Output: numeric value
Description: Returns the total number (count) of input values.
In case of COUNT(*)
or COUNT(<literal>)
, all values are considered (including null
or missing ones).
In case of COUNT(<field_name>)
null
values are not considered.
SELECT COUNT(*) AS count FROM emp; count --------------- 100
COUNT(ALL)
editSynopsis:
Input:
Output: numeric value
Description: Returns the total number (count) of all non-null input values. COUNT(<field_name>)
and COUNT(ALL <field_name>)
are equivalent.
SELECT COUNT(ALL last_name) AS count_all, COUNT(DISTINCT last_name) count_distinct FROM emp; count_all | count_distinct ---------------+------------------ 100 |96
SELECT COUNT(ALL CASE WHEN languages IS NULL THEN -1 ELSE languages END) AS count_all, COUNT(DISTINCT CASE WHEN languages IS NULL THEN -1 ELSE languages END) count_distinct FROM emp; count_all | count_distinct ---------------+--------------- 100 |6
COUNT(DISTINCT)
editSynopsis:
Input:
Output: numeric value
Description: Returns the total number of distinct non-null values in input values.
SELECT COUNT(DISTINCT hire_date) unique_hires, COUNT(hire_date) AS hires FROM emp; unique_hires | hires ----------------+--------------- 99 |100
SELECT COUNT(DISTINCT DATE_TRUNC('YEAR', hire_date)) unique_hires, COUNT(DATE_TRUNC('YEAR', hire_date)) AS hires FROM emp; unique_hires | hires ---------------+--------------- 14 |100
FIRST/FIRST_VALUE
editSynopsis:
Input:
Output: same type as the input
Description: Returns the first non-NULL value (if such exists) of the field_name
input column sorted by
the ordering_field_name
column. If ordering_field_name
is not provided, only the field_name
column is used for the sorting. E.g.:
a | b |
---|---|
100 |
1 |
200 |
1 |
1 |
2 |
2 |
2 |
10 |
null |
20 |
null |
null |
null |
SELECT FIRST(a) FROM t
will result in:
FIRST(a) |
1 |
and
SELECT FIRST(a, b) FROM t
will result in:
FIRST(a, b) |
100 |
SELECT FIRST(first_name) FROM emp; FIRST(first_name) -------------------- Alejandro
SELECT gender, FIRST(first_name) FROM emp GROUP BY gender ORDER BY gender; gender | FIRST(first_name) ------------+-------------------- null | Berni F | Alejandro M | Amabile
SELECT FIRST(first_name, birth_date) FROM emp; FIRST(first_name, birth_date) -------------------------------- Remzi
SELECT gender, FIRST(first_name, birth_date) FROM emp GROUP BY gender ORDER BY gender; gender | FIRST(first_name, birth_date) --------------+-------------------------------- null | Lillian F | Sumant M | Remzi
FIRST_VALUE
is a name alias and can be used instead of FIRST
, e.g.:
SELECT gender, FIRST_VALUE(first_name, birth_date) FROM emp GROUP BY gender ORDER BY gender; gender | FIRST_VALUE(first_name, birth_date) --------------+-------------------------------------- null | Lillian F | Sumant M | Remzi
SELECT gender, FIRST_VALUE(SUBSTRING(first_name, 2, 6), birth_date) AS "first" FROM emp GROUP BY gender ORDER BY gender; gender | first ---------------+--------------- null |illian F |umant M |emzi
FIRST
cannot be used in a HAVING clause.
FIRST
cannot be used with columns of type text
unless
the field is also saved as a keyword.
LAST/LAST_VALUE
editSynopsis:
Input:
Output: same type as the input
Description: It’s the inverse of FIRST/FIRST_VALUE
. Returns the last non-NULL value (if such exists) of the
field_name
input column sorted descending by the ordering_field_name
column. If ordering_field_name
is not
provided, only the field_name
column is used for the sorting. E.g.:
a | b |
---|---|
10 |
1 |
20 |
1 |
1 |
2 |
2 |
2 |
100 |
null |
200 |
null |
null |
null |
SELECT LAST(a) FROM t
will result in:
LAST(a) |
200 |
and
SELECT LAST(a, b) FROM t
will result in:
LAST(a, b) |
2 |
SELECT LAST(first_name) FROM emp; LAST(first_name) ------------------- Zvonko
SELECT gender, LAST(first_name) FROM emp GROUP BY gender ORDER BY gender; gender | LAST(first_name) ------------+------------------- null | Patricio F | Xinglin M | Zvonko
SELECT LAST(first_name, birth_date) FROM emp; LAST(first_name, birth_date) ------------------------------- Hilari
SELECT gender, LAST(first_name, birth_date) FROM emp GROUP BY gender ORDER BY gender; gender | LAST(first_name, birth_date) -----------+------------------------------- null | Eberhardt F | Valdiodio M | Hilari
LAST_VALUE
is a name alias and can be used instead of LAST
, e.g.:
SELECT gender, LAST_VALUE(first_name, birth_date) FROM emp GROUP BY gender ORDER BY gender; gender | LAST_VALUE(first_name, birth_date) -----------+------------------------------------- null | Eberhardt F | Valdiodio M | Hilari
SELECT gender, LAST_VALUE(SUBSTRING(first_name, 3, 8), birth_date) AS "last" FROM emp GROUP BY gender ORDER BY gender; gender | last ---------------+--------------- null |erhardt F |ldiodio M |lari
LAST
cannot be used in HAVING
clause.
LAST
cannot be used with columns of type text
unless
the field is also saved as a keyword
.
MAX
editSynopsis:
Input:
Output: same type as the input
Description: Returns the maximum value across input values in the field field_name
.
SELECT MAX(salary) AS max FROM emp; max --------------- 74999
SELECT MAX(ABS(salary / -12.0)) AS max FROM emp; max ----------------- 6249.916666666667
MAX
on a field of type text
or keyword
is translated into
LAST/LAST_VALUE
and therefore, it cannot be used in HAVING
clause.
MIN
editSynopsis:
Input:
Output: same type as the input
Description: Returns the minimum value across input values in the field field_name
.
SELECT MIN(salary) AS min FROM emp; min --------------- 25324
MIN
on a field of type text
or keyword
is translated into
FIRST/FIRST_VALUE
and therefore, it cannot be used in HAVING
clause.
SUM
editSynopsis:
Input:
Output: bigint
for integer input, double
for floating points
Description: Returns the sum of input values in the field field_name
.
SELECT SUM(salary) AS sum FROM emp; sum --------------- 4824855
SELECT ROUND(SUM(salary / 12.0), 1) AS sum FROM emp; sum --------------- 402071.3
Statistics
editKURTOSIS
editSynopsis:
Input:
Output: double
numeric value
Description:
Quantify the shape of the distribution of input values in the field field_name
.
SELECT MIN(salary) AS min, MAX(salary) AS max, KURTOSIS(salary) AS k FROM emp; min | max | k ---------------+---------------+------------------ 25324 |74999 |2.0444718929142986
KURTOSIS
cannot be used on top of scalar functions or operators but only directly on a field. So, for example,
the following is not allowed and an error is returned:
SELECT KURTOSIS(salary / 12.0), gender FROM emp GROUP BY gender
MAD
editSynopsis:
Input:
Output: double
numeric value
Description:
Measure the variability of the input values in the field field_name
.
SELECT MIN(salary) AS min, MAX(salary) AS max, AVG(salary) AS avg, MAD(salary) AS mad FROM emp; min | max | avg | mad ---------------+---------------+---------------+--------------- 25324 |74999 |48248.55 |10096.5
SELECT MIN(salary / 12.0) AS min, MAX(salary / 12.0) AS max, AVG(salary/ 12.0) AS avg, MAD(salary / 12.0) AS mad FROM emp; min | max | avg | mad ------------------+-----------------+---------------+----------------- 2110.3333333333335|6249.916666666667|4020.7125 |841.3750000000002
PERCENTILE
editSynopsis:
Input:
Output: double
numeric value
Description:
Returns the nth percentile (represented by numeric_exp
parameter)
of input values in the field field_name
.
SELECT languages, PERCENTILE(salary, 95) AS "95th" FROM emp GROUP BY languages; languages | 95th ---------------+----------------- null |74999.0 1 |72790.5 2 |71924.70000000001 3 |73638.25 4 |72115.59999999999 5 |61071.7
SELECT languages, PERCENTILE(salary / 12.0, 95) AS "95th" FROM emp GROUP BY languages; languages | 95th ---------------+------------------ null |6249.916666666667 1 |6065.875 2 |5993.725 3 |6136.520833333332 4 |6009.633333333332 5 |5089.3083333333325
PERCENTILE_RANK
editSynopsis:
Input:
Output: double
numeric value
Description:
Returns the nth percentile rank (represented by numeric_exp
parameter)
of input values in the field field_name
.
SELECT languages, PERCENTILE_RANK(salary, 65000) AS rank FROM emp GROUP BY languages; languages | rank ---------------+----------------- null |73.65766569962062 1 |73.7291625157734 2 |88.88005607010643 3 |79.43662623295829 4 |85.70446389643493 5 |100.0
SELECT languages, PERCENTILE_RANK(salary/12, 5000) AS rank FROM emp GROUP BY languages; languages | rank ---------------+------------------ null |66.91240875912409 1 |66.70766707667076 2 |84.13266895048271 3 |61.052992625621684 4 |76.55646443990001 5 |94.00696864111498
SKEWNESS
editSynopsis:
Input:
Output: double
numeric value
Description:
Quantify the asymmetric distribution of input values in the field field_name
.
SELECT MIN(salary) AS min, MAX(salary) AS max, SKEWNESS(salary) AS s FROM emp; min | max | s ---------------+---------------+------------------ 25324 |74999 |0.2707722118423227
SKEWNESS
cannot be used on top of scalar functions but only directly on a field. So, for example, the following is
not allowed and an error is returned:
SELECT SKEWNESS(ROUND(salary / 12.0, 2), gender FROM emp GROUP BY gender
STDDEV_POP
editSynopsis:
Input:
Output: double
numeric value
Description:
Returns the population standard deviation of input values in the field field_name
.
SELECT MIN(salary) AS min, MAX(salary) AS max, STDDEV_POP(salary) AS stddev FROM emp; min | max | stddev ---------------+---------------+------------------ 25324 |74999 |13765.125502787832
SELECT MIN(salary / 12.0) AS min, MAX(salary / 12.0) AS max, STDDEV_POP(salary / 12.0) AS stddev FROM emp; min | max | stddev ------------------+-----------------+----------------- 2110.3333333333335|6249.916666666667|1147.093791898986
SUM_OF_SQUARES
editSynopsis:
Input:
Output: double
numeric value
Description:
Returns the sum of squares of input values in the field field_name
.
SELECT MIN(salary) AS min, MAX(salary) AS max, SUM_OF_SQUARES(salary) AS sumsq FROM emp; min | max | sumsq ---------------+---------------+---------------- 25324 |74999 |2.51740125721E11
SELECT MIN(salary / 24.0) AS min, MAX(salary / 24.0) AS max, SUM_OF_SQUARES(salary / 24.0) AS sumsq FROM emp; min | max | sumsq ------------------+------------------+------------------- 1055.1666666666667|3124.9583333333335|4.370488293767361E8
VAR_POP
editSynopsis:
Input:
Output: double
numeric value
Description:
Returns the population variance of input values in the field field_name
.
SELECT MIN(salary) AS min, MAX(salary) AS max, VAR_POP(salary) AS varpop FROM emp; min | max | varpop ---------------+---------------+---------------- 25324 |74999 |1.894786801075E8
SELECT MIN(salary / 24.0) AS min, MAX(salary / 24.0) AS max, VAR_POP(salary / 24.0) AS varpop FROM emp; min | max | varpop ------------------+------------------+------------------ 1055.1666666666667|3124.9583333333335|328956.04185329855