What’s new in 8.16
editWhat’s new in 8.16
editComing in 8.16.
Here are the highlights of what’s new and improved in Elasticsearch 8.16! For detailed information about this release, see the Release notes and Migration guide.
Other versions:
8.15 | 8.14 | 8.13 | 8.12 | 8.11 | 8.10 | 8.9 | 8.8 | 8.7 | 8.6 | 8.5 | 8.4 | 8.3 | 8.2 | 8.1 | 8.0
ESQL: INLINESTATS
editThis adds the INLINESTATS
command to ESQL which performs a STATS and
then enriches the results into the output stream. So, this query:
FROM test | INLINESTATS m=MAX(a * b) BY b | WHERE m == a * b | SORT a DESC, b DESC | LIMIT 3
Produces output like:
| a | b | m | | --- | --- | ----- | | 99 | 999 | 98901 | | 99 | 998 | 98802 | | 99 | 997 | 98703 |
Always allow rebalancing by default
editIn earlier versions of Elasticsearch the cluster.routing.allocation.allow_rebalance
setting defaults to
indices_all_active
which blocks all rebalancing moves while the cluster is in yellow
or red
health. This was
appropriate for the legacy allocator which might do too many rebalancing moves otherwise. Today’s allocator has
better support for rebalancing a cluster that is not in green
health, and expects to be able to rebalance some
shards away from over-full nodes to avoid allocating shards to undesirable locations in the first place. From
version 8.16 allow_rebalance
setting defaults to always
unless the legacy allocator is explicitly enabled.
Add global retention in data stream lifecycle
editData stream lifecycle now supports configuring retention on a cluster level, namely global retention. Global retention \nallows us to configure two different retentions:
-
data_streams.lifecycle.retention.default
is applied to all data streams managed by the data stream lifecycle that do not have retention defined on the data stream level. -
data_streams.lifecycle.retention.max
is applied to all data streams managed by the data stream lifecycle and it allows any data stream \ndata to be deleted after themax_retention
has passed.
Enable ZStandard compression for indices with index.codec set to best_compression
editBefore DEFLATE compression was used to compress stored fields in indices with index.codec index setting set to best_compression, with this change ZStandard is used as compression algorithm to stored fields for indices with index.codec index setting set to best_compression. The usage ZStandard results in less storage usage with a similar indexing throughput depending on what options are used. Experiments with indexing logs have shown that ZStandard offers ~12% lower storage usage and a ~14% higher indexing throughput compared to DEFLATE.
[[8_x_remove_zstd_feature_flag_for_index_codec_best_compression]] === [8.x] Remove zstd feature flag for index codec best compression Backports the following commits to 8.x: - Remove zstd feature flag for index codec best compression. (#112665)
ESQL: Introduce per agg filter
editAdd support for aggregation scoped filters that work dynamically on the data in each group.
| STATS success = COUNT(*) WHERE 200 <= code AND code < 300, redirect = COUNT(*) WHERE 300 <= code AND code < 400, client_err = COUNT(*) WHERE 400 <= code AND code < 500, server_err = COUNT(*) WHERE 500 <= code AND code < 600, total_count = COUNT(*)
Implementation wise, the base AggregateFunction has been extended to allow a filter to be passed on. This is required to incorporate the filter as part of the aggregate equality/identity which would fail with the filter as an external component. As part of the process, the serialization for the existing aggregations had to be fixed so AggregateFunction implementations so that it delegates to their parent first.
ESQL: Multi-value fields supported in Geospatial predicates
editSupporting multi-value fields in WHERE
predicates is a challenge due to not knowing whether ALL
or ANY
of the values in the field should pass the predicate.
For example, should the field age:[10,30]
pass the predicate WHERE age>20
or not?
This ambiguity does not exist with the spatial predicates
ST_INTERSECTS
and ST_DISJOINT
, because the choice between ANY
or ALL
is implied by the predicate itself.
Consider a predicate checking a field named location
against a test geometry named shape
:
-
ST_INTERSECTS(field, shape)
- true ifANY
value can intersect the shape -
ST_DISJOINT(field, shape)
- true only ifALL
values are disjoint from the shape
This works even if the shape argument is itself a complex or compound geometry.
Similar logic exists for ST_CONTAINS
and ST_WITHIN
predicates, but these are not as easily solved
with ANY
or ALL
, because a collection of geometries contains another collection if each of the contained
geometries is within at least one of the containing geometries. Evaluating this requires that the multi-value
field is first combined into a single geometry before performing the predicate check.
-
ST_CONTAINS(field, shape)
- true if the combined geometry contains the shape -
ST_WITHIN(field, shape)
- true if the combined geometry is within the shape
Enhance SORT push-down to Lucene to cover references to fields and ST_DISTANCE function
editThe most used and likely most valuable geospatial search query in Elasticsearch is the sorted proximity search, finding items within a certain distance of a point of interest and sorting the results by distance. This has been possible in ES|QL since 8.15.0, but the sorting was done in-memory, not pushed down to Lucene. Now the sorting is pushed down to Lucene, which results in a significant performance improvement.
Queries that perform both filtering and sorting on distance are supported. For example:
FROM test | EVAL distance = ST_DISTANCE(location, TO_GEOPOINT("POINT(37.7749, -122.4194)")) | WHERE distance < 1000000 | SORT distance ASC, name DESC | LIMIT 10
In addition, the support for sorting on EVAL expressions has been extended to cover references to fields:
FROM test | EVAL ref = field | SORT ref ASC | LIMIT 10
Cross-cluster search telemetry
editThe cross-cluster search telemetry is collected when cross-cluster searches
are performed, and is returned as "ccs" field in _cluster/stats
output.
It also add a new parameter include_remotes=true
to the _cluster/stats
API
which will collect data from connected remote clusters.