Elasticsearch 5.2.0 released
Today we are pleased to announce the release of Elasticsearch 5.2.0, the latest stable release, with numeric and date range fields, the cluster-allocation-explain API, keyword normalizers, and partitionable terms aggregations. It is already available for deployment on Elastic Cloud, our Elasticsearch-as-a-service platform.
Latest stable release in 5.x:
Full details of the changes in this release are available in the release notes listed above, but there are a few important changes which are worth singling out:
While numeric and date fields index a discrete number or point in time, numeric and date range fields allow you to index numeric and date ranges, such as Friday 27 January 2017, between 6pm and 8pm. Range queries can be used to search for ranges which overlap, are completely contained or which contain, or which do not overlap at all. This allows you to answer questions like "What entertainment is available on Thursday evening?", a query which was previously very difficult to construct.
See Numeric and Date Ranges in Elasticsearch: Just Another Brick in the Wall for more information.
PagerDuty wakes you up at 3AM with a red cluster. Which API do you reach for to figure out why shards aren’t being allocated? Previously, you had to consult several APIs to put together the complete picture. Now, the answer is simple: the cluster allocation explain API. This API can tell you whether a shard has failed to allocate or is just waiting its turn, and why allocation failed, whether it be a corrupt shard, full disks, or bad settings. It can also tell you why a shard is assigned to a particular node, perhaps when you have tried to force relocation.
See RED Elasticsearch Cluster? Panic no longer for more information.
In 5.0.0, we separated the string
field type into text
(for analyzed full text) and keyword
(for not-analyzed string identifiers). Fields of type text
can be analyzed into individual tokens for full text search, while keywords
support doc_values
, used for search, aggregations, sorting, and in scripts. However, there are times when you need some of the power of the analysis chain to normalise keywords, such as lower-casing email addresses or zip codes.
This release brings normalizers
to keyword
fields. Normalizers are similar to analyzers except that they may only emit a single token. As a consequence, they do not have a tokenizer
and only accept a subset of the available char filters and token filters. Only the filters that work on a per-character basis are allowed. For instance a lowercasing filter would be allowed, but not a stemming filter, which needs to look at the keyword as a whole.
The terms
aggregations returns the top 10 terms by default. We are often asked "But how do I return ALL terms?". The answer up until now is "You don’t". It was simply too memory intensive to collect all the terms of a high cardinality field from all the nodes in the cluster and to reduce them to a single result set. It also defeated the purpose of the terms
agg, which was designed to return the top-N counts from huge datasets at speed.
However, as we’ve seen many times over, users surprise us with the problems that they solve with Elasticsearch. A frequent request was to be able to return all terms, even if the response is not instantaneous. You can now break your terms down into partitions — the more unique terms you have the more partitions you will need. For instance, you could choose to use 20 partitions and run 20 search requests, each one requesting a single partition.
See Filtering Values with Partitions for more info.
Please download Elasticsearch 5.2.0, try it out, and let us know what you think on Twitter (@elastic) or in our forum. You can report any problems on the GitHub issues page.