This Week in Elasticsearch and Apache Lucene - 2017-07-10
Welcome to This Week in Elasticsearch and Apache Lucene! With this weekly series, we're bringing you an update on all things Elasticsearch and Apache Lucene at Elastic, including the latest on commits, releases and other learning resources.
Defence against full disks
Elasticsearch had a first line of defence in place to prevent nodes from running out of disk space. Once a node has reached a certain disk-utilization the cluster tries to move shards away from nodes with little disk space to prevent catastrophic situations where nodes fail will no space left on their devices. However, moving shards around is not always possible and even if it is, indexing might fill up disks faster than shards can be relocated.
Coming with Elasticsearch 6.0, nodes will stop accepting write requests to indices that have one or more shards allocated on a node being tight on disk space. This should provide a safer failure mode for clusters than the current behavior, where resources may be exhausted before being detected. Users will need to manually set indices back to read/write once they have provisioned more disk space.
Sequence Numbers
The last 6.0 blockers for sequence numbers are closed! All Sequence Numbers related PRs slated for 6.0 are merged. We now have:
- fast operation based recovery,
- a custom translog retention policy to make fast recovery more likely,
- cleanup of old transaction logs on idle indices, and
- a primary/replica sync on primary promotions.
We also have the infrastructure we need to start developing the cross data centre replication (xDCR) X-Pack feature. We will continue to use the new infrastructure to tackle more complex correctness problems: the roll back of unneeded operations in replicas and usage of sequence numbers for optimistic locking.
Aggregation rewriting
Aggregations now have a rewrite phase which is similar to the query rewrite phase. This gives any aggregation the opportunity to rewrite itself into a simpler or more generic form, which increases the chance of the aggregation being cached. The first implementation of this rewrite is in the filter/filters aggregations where we now rewrite the filters meaning we can cache requests which use the filter/filters aggregation (as long as the underlying filter is cacheable).
Java High Level REST Client docs
The docs for the upcoming Java High Level REST Client are in the workings. You can get a preview here: https://www.elastic.co/guide/en/elasticsearch/client/java-rest/5.x . Plan is to add the missing search docs and a specific page with examples on how to migrate from Java API to REST client.
Cluster alerts in monitoring now cacheable
Cluster alerts (which use Watcher internally) were problematic for users with many clusters as they would run into the soft limit on the number of script compilations per minute. (Mustache templates, used by Watcher, are treated as scripts internally). Now, all the watches in cluster monitoring use the same searches, as we removed custom search inputs per cluster, so that it does not matter how many different clusters are being monitored. This means, there is no tweaking of the compilations per minute settings necessary. This was uncovered by an SDH ticket of a customer who monitored several clusters.
More compact _id
s
We had a longstanding issue about storing ids in binary form in the index. Elasticsearch accepts arbitrary strings as document identifiers, and up to now we used UTF8 encoding when indexing/storing them in the index. However, it is very common to have strings that represent a base64-encoded byte[] as ids (eg. autogenerated ids) or a number (eg. auto-increment ids coming from an external database whose content is replicated to Elasticsearch). In those cases, the UTF8 representation is respectively 33% and 2.4x larger than the original binary representation, so we had room for improvement.
We just changed the internal representation of ids to try to detect when the id might be a base64-encoded byte[] or a stringified number, and use a more efficient representation in those cases. This encoding makes base64-encoded ids about 32% smaller and numeric ids about 2x smaller compared to today, very close to the size of the original binary representation of those ids.
Beware that these savings will not automatically translate to significant reductions of the size of the index, given that Lucene performs prefix-compression of those ids in the terms dictionary, and LZ4 (or DEFLATE if using index.codec: best_compression
) compression of those ids in stored fields. However, more compact ids mean that Lucene has fewer bytes to compare when sorting values at flush time, when merging terms dictionaries at merge time, and when compressing data in general. This change will also allow to work on interesting follow-ups such as reordering bytes of autogenerated ids and dropping the sortability of ids (which we do not leverage) in favour of an order of bytes that makes prefix compression and indexing more efficient.
Changes in 5.x:
- The
format
parameter ofdate
fields is no longer updatable as it could prevent already indexed docs from being reindexed. - Not-analyzed string fields upgraded from 2.x will no longer return
fielddata:false
in their mapping as this interfered with reindexing in 5.x. - The
join
field (replacement for parent-child) should not add the parent type and id to each search hit as this information is already available in the source and these fields interfere with reindexing. - Cross-cluster search now validates the cluster name whenever it updates its list of seed names, to be sure that the cluster name hasn't changed.
- Upgraded to Netty 4.1.13.Final.
Changes in master:
- BREAKING: A request to a valid REST endpoint with an unsupported HTTP method will now return a
405 METHOD NOT ALLOWED
status, and anOPTIONS
request to any REST endpoint will respond with the list of allowed HTTP methods. This required refactoringPathTrie
andRestController
to use a singlePathTrie
for all endpoints. - BREAKING: The deprecated
created
andfound
response keys in index, delete, and bulk have been removed in favour of theresult
key. - BREAKING: Removed deprecated
fielddata_fields
from the search request. - BREAKING:
QueryParseContext
has been removed as it had become a simple wrapper aroundXContentParser
. - BREAKING: Removed the deprecated
IdsQueryBuilder
constructor which acceptedtypes
. - The
index.mapping.single_type
setting now defaults totrue
, and can no longer be set in 6.0. - The
_analyze
API now supports normalizers. - Snapshots to S3 sometimes failed with a security exception when a stream was closed during snapshotting.
- The search API will no longer silently ignore negative
size
values. - The
transport.profiles.*
settings have had a big refactoring including adding validation for these settings. - Added a framework for cross-validating mutually dependent settings. This allows validating that disk threshold settings are correctly set.
Apache Lucene
Lucene 7.0 - feature freeze There has been a surge of changes to be made to 7.0, so the feature freeze was delayed until July 10th.
Better version compatibility checks
Now that Lucene stores the version that was used to first create the index, it will use this version in order to check that you are not attempting to read a too old index. It was impossible before since Lucene only recorded the version that was used to write segments and commit points, so it was possible to use merging to make Lucene think an index is recent when in fact it is not. As of Lucene 8.0, this leniency will be gone.
Other changes
- The new concurrent delete/update improvements had a race condition.
- The addition of hooks to run wildcard terms in phrases proves controversial since it would encourage usage of the dangerous SpanMultiTermQueryWrapper.
- SpanMultiTermQueryWrapper is a dangerous query which might expand to an arbitrary number of terms. We are looking into adding protection against this.
- Can we improve IndexOrDocValuesQuery to be more aware of the difference in cost between points and doc values?
- We keep forgetting that we already have a query that matches documents that have a value for a given doc-value field, so we decided to rename it so that it better reflects what it does.
- The fact that compiling scripts gets exponentially slower over time triggered a discussion about whether we should cache the result of compilations.
- Factory methods of doc values queries have been renamed to make sure their name includes "slow".
- Can we add a sort field that allows to sort child docs by value of their parent document?
- Should we use a common interface when multiple analysis factories set the same option?
- You can now group using the new ValuesSources API.
- IndexWriter sometimes gets wrong about the total number of docs in the index.
Watch This Space
Stay tuned to this blog, where we'll share more news on the whole Elastic ecosystem including news, learning resources and cool use cases!