What’s new in 8.15edit

Coming in 8.15.

Here are the highlights of what’s new and improved in Elasticsearch 8.15!

Stored fields are now compressed with ZStandard instead of LZ4/DEFLATEedit

Stored fields are now compressed by splitting documents into blocks, which are then compressed independently with ZStandard. index.codec: default (default) uses blocks of at most 14kB or 128 documents compressed with level 0, while index.codec: best_compression uses blocks of at most 240kB or 2048 documents compressed at level 3. On most datasets that we tested against, this yielded storage improvements in the order of 10%, slightly faster indexing and similar retrieval latencies.

#103374

Query phase KNN now supports query_vector_builderedit

It is now possible to pass model_text and model_id within a knn query in the [query DSL](https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-knn-query.html) to convert a text query into a dense vector and run the nearest neighbor query on it, instead of requiring the dense vector to be directly passed (within the query_vector parameter). Similar to the [top-level knn query](https://www.elastic.co/guide/en/elasticsearch/reference/current/knn-search.html) (executed in the DFS phase), it is possible to supply a query_vector_builder object containing a text_embedding object with model_text (the text query to be converted into a dense vector) and model_id (the identifier of a deployed model responsible for transforming the text query into a dense vector). Note that an embedding model with the referenced model_id needs to be [deployed on a ML node](https://www.elastic.co/guide/en/machine-learning/current/ml-nlp-deploy-models.html). in the cluster.

#106068

A SIMD (Neon) optimised vector distance function for merging int8 Scalar Quantized vectors has been addededit

An optimised int8 vector distance implementation for aarch64 has been added. This implementation is currently only used during merging. The vector distance implementation outperforms Lucene’s Pamana Vector implementation for binary comparisons by approx 5x (depending on the number of dimensions). It does so by means of SIMD (Neon) intrinsics compiled into a separate native library and link by Panama’s FFI. Comparisons are performed on off-heap mmap’ed vector data. Macro benchmarks, SO_Dense_Vector with scalar quantization enabled, shows significant improvements in merge times, approximately 3 times faster.

#106133