Elasticsearch 5.4.0 released
Today we are pleased to announce the release of Elasticsearch 5.4.0, based on Lucene 6.5.0. This is the latest stable release.
Latest stable release in 5.x:
You can read about all the changes in the release notes linked above, but there are a few changes which are worth highlighting below.
When you perform a search request, the node which receives the request becomes the coordinating node which is in charge of forwarding the shard-level requests to the appropriate data nodes, collecting the results, and merging them into a single result set. The memory use on the coordinating node varies according to the number of involved shards. Previous, we had added a 1,000 shard soft limit to try to prevent coordinating nodes from using too much memory.
That said, it is quite easy to reach the 1,000 shard limit, especially with the recent release of Cross Cluster Search. As of 5.4.0, Top-N search results and aggregations are reduced in batches of 512, which puts an upper limit on the amount of memory used on the coordinating node, which has allowed us to set the shard soft limit to unlimited by default.
There has been much work recently on improving Lucene’s handling of graph token streams, where analysis of text, either from a document during indexing, or a query during searching, produces multiple overlapping paths or interpretations for the tokens. Multi-word synonyms do this and have long been buggy when used with proximity queries.
Thanks to the recent addition of the synonym_graph token filter
token filter as well as improvements to Lucene’s query parsers to translate the token graph into separate queries, such analysis chains are finally handled correctly at search time. Since 5.2, we’ve also added the word_delimiter_graph
token filter, and graph-enabled the shingles
, cjk
, ngram
, and common_grams
token filters, and the kuromoji_tokenizer
. There is also the flatten
token filter which needs to be used as the final token filter at index time to convert a graph into a form which can be indexed.
This release ships with a number of query optimizations. The commonly used range
query needs to look at every document in the index, and so can become a bottleneck in query execution. The new range
query will automatically choose the more efficient of two query modes, based on the other queries in the search request. See Better Query Planning for Range Queries in Elasticsearch for more info.
On top of that, some nested
queries have received a speed boost as we are being cleverer about which filters need to be applied to a particular nested
query. For instance, if the field being queried only exists in nested
documents, then we no longer need a filter to exclude the parent document.
And finally, large terms
queries were slower to parse because of the keyword normalizers
added in 5.2. In this release, only fields that have a custom normalizer
are normalised.
Some other changes worth mentioning are:
- We’ve tweaked the default Netty receive predictor size to 64 kB to balance throughput with garbage collection and heap allocation.
-
Date-range queries in the percolator can now use
now
, which will be calculated at execution time. -
The
unified
highlighter gained support forfragment_length
. -
We’re slowly migrating sensitive settings (like S3 and EC2 passwords) to use the secure settings keystore, instead of being stored in the plain text
elasticsearch.yaml
file. -
The new
single-node
discovery type disables bootstrap checks, which makes it easier for Docker users to run tests against Elasticsearch with the TransportClient.
Please download Elasticsearch 5.4.0, try it out, and let us know what you think on Twitter (@elastic) or in our forum. You can report any problems on the GitHub issues page.