Elasticsearch highlights
editElasticsearch highlights
editThis list summarizes the most important enhancements in Elasticsearch 7.7. For the complete list, go to Elasticsearch release highlights.
Fixed index corruption on shrunk indices
editApplying deletes or updates on an index after it had been shrunk would likely
corrupt the index. We advise users of Elasticsearch 6.x who opt in for soft
deletes on some of their indices and all users of Elasticsearch 7.x to upgrade
to 7.7 as soon as possible to no longer be subject to this corruption bug. In
case upgrading in the near future is not an option, we recommend to completely
stop using _shrink
on read-write indices and to do a force-merge right after
shrinking on read-only indices, which significantly reduces the likeliness of
being affected by this bug in case deletes or updates get applied by mistake.
This bug is fixed as of Elasticsearch 7.7.0. Low-level details can be found on the
corresponding issue.
Significant reduction of heap usage of segments
editThis release of Elasticsearch significantly reduces the amount of heap memory that is needed to keep Lucene segments open. In addition to helping with cluster stability, this helps reduce costs by storing much more data per node before hitting memory limits.
Transforms – now in GA!
editIn 7.7, we move transforms from beta to general availability.
Transforms enable you to pivot existing Elasticsearch indices using group-by and aggregations into a destination feature index, which provides opportunities for new insights and analytics. For example, you can use transforms to pivot your data into entity-centric indices that summarize the behavior of users or sessions or other entities in your data.
Transforms now include support for cross-cluster search. Allowing you to create your destination feature index on a separate cluster from the source indices.
Aggregation support has been expanded within transforms to include support for multi-value (percentiles) and filter aggregations. We also optimized the performance of the date histogram aggregations.
Introducing multiclass classification
editClassification using multiple classes is now available in data frame analytics. Classification is a supervised machine learning technique which has been already available as a binary process in the previous release. Multiclass classification works well with up to 30 distinct categories.
Feature importance at inference time
editFeature importance now can be calculated at inference time. This value provides further insight into the results of a classification or regression job and therefore helps interpret these results.
Finer memory control for bucket aggregations
editWhile building buckets, aggregations will now periodically check the
real-memory circuit breaker before continuing to allocate more buckets. This
allows better responsivity to memory pressure and avoids OutOfMemory
situations due to allocating more buckets than the node can handle.
A new way of searching: asynchronously
editYou can now submit long-running searches using
the new _async_search
API. The new API accepts the
same parameters and request body as the Search API.
However, instead of blocking and returning the final response only when it’s
entirely finished, you can retrieve results from an async search as they become
available.
The request takes a parameter, wait_for_completion
, which controls how long
the server will wait until it sends back a response. The first response
contains among others a search unique ID, a response version, an indication if
this response is partial or not, plus the usual metadata (shards involved,
number of hits etc) and potentially results. If the response is not complete
and final, the client can continue polling for results, issuing a new request
using the provided search ID. If new results are available, the returned
version is incremented and the new batch of results are returned. This can
continue until all the results are fetched.
Unless deleted earlier by the user, the asynchronous searches are kept alive
for a given interval. This defaults to 5 days and can be controlled by another
request parameter, keep_alive
.
Password protection for the keystore
editElasticsearch uses a custom on-disk keystore for secure settings such as passwords and SSL certificates. Up until now, this prevented users with command-line access from viewing secure files by listing commands, but nothing prevented such users from changing values in the keystore, or removing values from it. Furthermore, the values were only obfuscated by a hash; no user-specific secret protected the secure settings.
This new feature changes all of that by adding password-protection to the keystore. This is not be a breaking change: if a keystore has no password, there won’t be any new prompts. A user must choose to password-protect their keystore in order to benefit from the new behavior.
A new aggregation: top_metrics
editThe new top_metrics
aggregation "selects" a metric from a document according
to a criteria on a given, different field. That criteria is currently the
largest or smallest "sort" value. It is fairly similar to top_hits
in spirit,
but because it is more limited, top_metrics
uses less memory and
is often faster.
Query speed-up for sorted queries on time-based indices
editWe’ve optimized sorted, top-documents-only queries run on time-based indices. The optimization stems from the fact that the ranges of (document) timestamps in the shards don’t overlap. It is implemented by rewriting the shard search requests based on the partial results already available from other shards, if it can be determined that the query will not yield any result from the current shard; i.e. we know in advance that the bottom entry of the (sorted) result set after a partial merge is better than the values contained in this current shard.
A new aggregation: boxplot
editThe interquartile range (IQR) is a common robust measure of statistical dispersion. Compared to the standard deviation, the IQR is less sensitive to outliers in the data, with a breakdown point of 0.25. Along with the median, it is often used in creating a box plot, a simple yet common way to summarize data and identify potential outliers.
The new boxplot
aggregation calculates the min, max, and medium as well as the first and third
quartiles of a given data set.
AArch64 support
editElasticsearch now provides AArch64 packaging, including bundling an AArch64 JDK distribution. There are some restrictions in place, namely no machine learning support and depending on underlying page sizes, class data sharing is disabled.