What’s new in 8.16

edit

Coming in 8.16.

Here are the highlights of what’s new and improved in Elasticsearch 8.16! For detailed information about this release, see the Release notes and Migration guide.

Other versions:

8.15 | 8.14 | 8.13 | 8.12 | 8.11 | 8.10 | 8.9 | 8.8 | 8.7 | 8.6 | 8.5 | 8.4 | 8.3 | 8.2 | 8.1 | 8.0

ESQL: INLINESTATS

edit

This adds the INLINESTATS command to ESQL which performs a STATS and then enriches the results into the output stream. So, this query:

FROM test
| INLINESTATS m=MAX(a * b) BY b
| WHERE m == a * b
| SORT a DESC, b DESC
| LIMIT 3

Produces output like:

| a | b | m | | --- | --- | ----- | | 99 | 999 | 98901 | | 99 | 998 | 98802 | | 99 | 997 | 98703 |

#109583

Always allow rebalancing by default

edit

In earlier versions of Elasticsearch the cluster.routing.allocation.allow_rebalance setting defaults to indices_all_active which blocks all rebalancing moves while the cluster is in yellow or red health. This was appropriate for the legacy allocator which might do too many rebalancing moves otherwise. Today’s allocator has better support for rebalancing a cluster that is not in green health, and expects to be able to rebalance some shards away from over-full nodes to avoid allocating shards to undesirable locations in the first place. From version 8.16 allow_rebalance setting defaults to always unless the legacy allocator is explicitly enabled.

#111015

Add global retention in data stream lifecycle

edit

Data stream lifecycle now supports configuring retention on a cluster level, namely global retention. Global retention \nallows us to configure two different retentions:

  • data_streams.lifecycle.retention.default is applied to all data streams managed by the data stream lifecycle that do not have retention defined on the data stream level.
  • data_streams.lifecycle.retention.max is applied to all data streams managed by the data stream lifecycle and it allows any data stream \ndata to be deleted after the max_retention has passed.

#111972

Enable ZStandard compression for indices with index.codec set to best_compression

edit

Before DEFLATE compression was used to compress stored fields in indices with index.codec index setting set to best_compression, with this change ZStandard is used as compression algorithm to stored fields for indices with index.codec index setting set to best_compression. The usage ZStandard results in less storage usage with a similar indexing throughput depending on what options are used. Experiments with indexing logs have shown that ZStandard offers ~12% lower storage usage and a ~14% higher indexing throughput compared to DEFLATE.

#112665

[[8_x_remove_zstd_feature_flag_for_index_codec_best_compression]] === [8.x] Remove zstd feature flag for index codec best compression Backports the following commits to 8.x: - Remove zstd feature flag for index codec best compression. (#112665)

#112857

ESQL: Introduce per agg filter

edit

Add support for aggregation scoped filters that work dynamically on the data in each group.

| STATS success = COUNT(*) WHERE 200 <= code AND code < 300,
        redirect = COUNT(*) WHERE 300 <= code AND code < 400,
        client_err = COUNT(*) WHERE 400 <= code AND code < 500,
        server_err = COUNT(*) WHERE 500 <= code AND code < 600,
        total_count = COUNT(*)

Implementation wise, the base AggregateFunction has been extended to allow a filter to be passed on. This is required to incorporate the filter as part of the aggregate equality/identity which would fail with the filter as an external component. As part of the process, the serialization for the existing aggregations had to be fixed so AggregateFunction implementations so that it delegates to their parent first.

#113735

ESQL: Multi-value fields supported in Geospatial predicates

edit

Supporting multi-value fields in WHERE predicates is a challenge due to not knowing whether ALL or ANY of the values in the field should pass the predicate. For example, should the field age:[10,30] pass the predicate WHERE age>20 or not? This ambiguity does not exist with the spatial predicates ST_INTERSECTS and ST_DISJOINT, because the choice between ANY or ALL is implied by the predicate itself. Consider a predicate checking a field named location against a test geometry named shape:

  • ST_INTERSECTS(field, shape) - true if ANY value can intersect the shape
  • ST_DISJOINT(field, shape) - true only if ALL values are disjoint from the shape

This works even if the shape argument is itself a complex or compound geometry.

Similar logic exists for ST_CONTAINS and ST_WITHIN predicates, but these are not as easily solved with ANY or ALL, because a collection of geometries contains another collection if each of the contained geometries is within at least one of the containing geometries. Evaluating this requires that the multi-value field is first combined into a single geometry before performing the predicate check.

  • ST_CONTAINS(field, shape) - true if the combined geometry contains the shape
  • ST_WITHIN(field, shape) - true if the combined geometry is within the shape

#112063

Enhance SORT push-down to Lucene to cover references to fields and ST_DISTANCE function

edit

The most used and likely most valuable geospatial search query in Elasticsearch is the sorted proximity search, finding items within a certain distance of a point of interest and sorting the results by distance. This has been possible in ES|QL since 8.15.0, but the sorting was done in-memory, not pushed down to Lucene. Now the sorting is pushed down to Lucene, which results in a significant performance improvement.

Queries that perform both filtering and sorting on distance are supported. For example:

FROM test
| EVAL distance = ST_DISTANCE(location, TO_GEOPOINT("POINT(37.7749, -122.4194)"))
| WHERE distance < 1000000
| SORT distance ASC, name DESC
| LIMIT 10

In addition, the support for sorting on EVAL expressions has been extended to cover references to fields:

FROM test
| EVAL ref = field
| SORT ref ASC
| LIMIT 10

#112938

Cross-cluster search telemetry

edit

The cross-cluster search telemetry is collected when cross-cluster searches are performed, and is returned as "ccs" field in _cluster/stats output. It also add a new parameter include_remotes=true to the _cluster/stats API which will collect data from connected remote clusters.

#113825