What’s new in 8.15

edit

Coming in 8.15.

Here are the highlights of what’s new and improved in Elasticsearch 8.15! For detailed information about this release, see the Release notes and Migration guide.

Other versions:

8.14 | 8.13 | 8.12 | 8.11 | 8.10 | 8.9 | 8.8 | 8.7 | 8.6 | 8.5 | 8.4 | 8.3 | 8.2 | 8.1 | 8.0

Stricter failure handling in multi-repo get-snapshots request handling

edit

If a multi-repo get-snapshots request encounters a failure in one of the targeted repositories then earlier versions of Elasticsearch would proceed as if the faulty repository did not exist, except for a per-repository failure report in a separate section of the response body. This makes it impossible to paginate the results properly in the presence of failures. In versions 8.15.0 and later this API’s failure handling behaviour has been made stricter, reporting an overall failure if any targeted repository’s contents cannot be listed.

#107191

Introduce logsdb index mode as Tech Preview

edit

This change introduces a new index mode named logsdb. When the new index mode is enabled then the following storage savings features are enabled automatically:

  • Synthetic source, which omits storing the _source field. When _source or part of it is requested it is synthesized on the fly at runtime.
  • Index sorting. By default indices are sorted by host.name and @timestamp fields at index time. This can be overwritten if other sorting fields yield better compression rate.
  • Enable more space efficient compression for fields with doc values enabled. These are the same codecs used when time_series index mode is enabled.

The index.mode index setting set to logsdb should be configured in index templates or defined when creating a plain index. Benchmarks and other tests have shown that logs data sets use around 2.5 times less storage with the new index mode enabled compared to not configuring it. The new logsdb index mode is a tech preview feature.

#108896

Add new int4 quantization to dense_vector

edit

New int4 (half-byte) scalar quantization support via two knew index types: int4_hnsw and int4_flat. This gives an 8x reduction from float32 with some accuracy loss. In addition to less memory required, this improves query and merge speed significantly when compared to raw vectors.

#109317

Mark Query Rules as GA

edit

This PR marks query rules as Generally Available. All APIs are no longer in tech preview.

#110004

Adds new bit element_type for dense_vectors

edit

This adds bit vector support by adding element_type: bit for vectors. This new element type works for indexed and non-indexed vectors. Additionally, it works with hnsw and flat index types. No quantization based codec works with this element type, this is consistent with byte vectors.

bit vectors accept up to 32768 dimensions in size and expect vectors that are being indexed to be encoded either as a hexidecimal string or a byte[] array where each element of the byte array represents 8 bits of the vector.

bit vectors support script usage and regular query usage. When indexed, all comparisons done are xor and popcount summations (aka, hamming distance), and the scores are transformed and normalized given the vector dimensions.

For scripts, l1norm is the same as hamming distance and l2norm is sqrt(l1norm). dotProduct and cosineSimilarity are not supported.

Note, the dimensions expected by this element_type are always to be divisible by 8, and the byte[] vectors provided for index must be have size dim/8 size, where each byte element represents 8 bits of the vectors.

#110059

The Redact processor is Generally Available

edit

The Redact processor uses the Grok rules engine to obscure text in the input document matching the given Grok patterns. The Redact processor was initially released as Technical Preview in 8.7.0, and is now released as Generally Available.

#110395

New custom parser for ISO-8601 datetimes

edit

This introduces a new custom parser for ISO-8601 datetimes, for the iso8601, strict_date_optional_time, and strict_date_optional_time_nanos built-in date formats. This provides a performance improvement over the default Java date-time parsing. Whilst it maintains much of the same behaviour, the new parser does not accept nonsensical date-time strings that have multiple fractional seconds fields or multiple timezone specifiers. If the new parser fails to parse a string, it will then use the previous parser to parse it. If a large proportion of the input data consists of these invalid strings, this may cause a small performance degradation. If you wish to force the use of the old parsers regardless, set the JVM property es.datetime.java_time_parsers=true on all ES nodes.

#106486

New custom parser for more ISO-8601 date formats

edit

Following on from #106486, this extends the custom ISO-8601 datetime parser to cover the strict_year, strict_year_month, strict_date_time, strict_date_time_no_millis, strict_date_hour_minute_second, strict_date_hour_minute_second_millis, and strict_date_hour_minute_second_fraction date formats. As before, the parser will use the existing java.time parser if there are parsing issues, and the es.datetime.java_time_parsers=true JVM property will force the use of the old parsers regardless.

#108606

Preview: Support for the Connection Type, 'Domain, and ISP databases in the geoip processor

edit

As a Technical Preview, the geoip processor can now use the commercial GeoIP2 Connection Type, GeoIP2 Domain, and GeoIP2 ISP databases from MaxMind.

#108683

Update Elasticsearch to Lucene 9.11

edit

Elasticsearch is now updated using the latest Lucene version 9.11. Here are the full release notes: But, here are some particular highlights: - Usage of MADVISE for better memory management: https://github.com/apache/lucene/pull/13196 - Use RWLock to access LRUQueryCache to reduce contention: https://github.com/apache/lucene/pull/13306 - Speedup multi-segment HNSW graph search for nested kNN queries: https://github.com/apache/lucene/pull/13121 - Add a MemorySegment Vector scorer - for scoring without copying on-heap vectors: https://github.com/apache/lucene/pull/13339

#109219

Synthetic _source improvements

edit

There are multiple improvements to synthetic _source functionality:

  • Synthetic _source is now supported for all field types including nested and object. object fields are supported with enabled set to false.
  • Synthetic _source can be enabled together with ignore_malformed and ignore_above parameters for all field types that support them.

#109501

Index sorting on indexes with nested fields

edit

Index sorting is now supported for indexes with mappings containing nested objects. The index sort spec (as specified by index.sort.field) can’t contain any nested fields, still.

#110251