What’s new in 7.10
editWhat’s new in 7.10
editHere are the highlights of what’s new and improved in Elasticsearch 7.10! For detailed information about this release, see the Release notes and Breaking changes.
Other versions: 7.9 | 7.8 | 7.7 | 7.6 | 7.5 | 7.4 | 7.3 | 7.2 | 7.1 | 7.0
Indexing speed improvement
editElasticsearch 7.10 improves indexing speed by up to 20%. We’ve reduced the coordination
needed to add entries to the transaction log.
This reduction allows for more concurrency and increases the transaction
log buffer size from 8KB
to 1MB
. However, performance gains are lower for
full-text search and other analysis-intensive use cases. The heavier the
indexing chain, the lower the gains, so indexing chains that involve many
fields, ingest pipelines or full-text indexing will see lower gains.
More space-efficient indices
editElasticsearch 7.10 depends on Apache Lucene 8.7, which introduces higher compression of
stored fields, the part of the index that notably stores the
_source
. On the various data sets that we
benchmark against, we noticed space reductions between 0% and 10%. This change
especially helps on data sets that have lots of redundant data across documents,
which is typically the case of the documents that are produced by our
Observability solutions, which repeat metadata about the host that produced the
data on every document.
Elasticsearch offers the ability to configure the
index.codec
setting to tell
Elasticsearch how aggressively to compress stored fields. Both supported values
default
and best_compression
will get better compression with this change.
Data tiers
edit7.10 introduces the concept of formalized data tiers within Elasticsearch. Data tiers are a simple, integrated approach that gives users control over optimizing for cost, performance, and breadth/depth of data. Prior to this formalization, many users configured their own tier topology using custom node attributes as well as using ILM to manage the lifecycle and location of data within a cluster.
With this formalization, data tiers (content, hot, warm, and cold) can be explicitly configured using node roles, and indices can be configured to be allocated within a specific tier using index-level data tier allocation filtering. ILM will make use of these tiers to automatically migrate data between nodes as an index goes through the phases of its lifecycle.
Newly created indices abstracted by a data stream will
be allocated to the data_hot
tier automatically, while standalone indices will
be allocated to the data_content
tier automatically. Nodes with the
pre-existing data
role are considered to be part of all tiers.
AUC ROC evaluation metrics for classification analysis
editArea under the curve of receiver operating characteristic (AUC ROC) is an evaluation metric that has been available for outlier detection since 7.3 and now is available for classification analysis. AUC ROC represents the performance of the classification process at different predicted probability thresholds. The true positive rate for a specific class is compared against the rate of all the other classes combined at the different threshold levels to create the curve.
Custom feature processors in data frame analytics
editFeature processors enable you to extract process features from document fields. You can use these features in model training and model deployment. Custom feature processors provide a mechanism to create features that can be used at search and ingest time and they don’t take up space in the index. This process more tightly couples feature generation with the resulting model. The result is simplified model management as both the features and the model can easily follow the same life cycle.
Points in time (PITs) for search
editIn 7.10, we’re introducing points in time (PITs), a lightweight way to preserve index state over searches. PITs improve end-user experience by making UIs more reactive.
By default, a search request waits for complete results before returning a response. For example, a search that retrieves top hits and aggregations returns a response only after both top hits and aggregations are computed. However, aggregations are usually slower and more expensive to compute than top hits. Instead of sending a combined request, you can send two separate requests: one for top hits and another one for aggregations. With separate search requests, a UI can display top hits as soon as they’re available and display aggregation data after the slower aggregation request completes. You can use a PIT to ensure both search requests run on the same data and index state.
To use a PIT in a search, you must first explicitly create the PIT using the new
open PIT API. PITs get automatically garbage-collected
after keep_alive
if no follow-up request extends their duration.
POST /my-index-000001/_pit?keep_alive=1m
The API returns a PIT ID you can use in search requests. You can also
configure by how long to extend your PIT’s lifespan using the search request’s
keep_alive
parameter.
POST /_search { "size": 100, "query": { "match" : { "title" : "elasticsearch" } }, "pit": { "id": "46ToAwMDaWR4BXV1aWQxAgZub2RlXzEAAAAAAAAAAAEBYQNpZHkFdXVpZDIrBm5vZGVfMwAAAAAAAAAAKgFjA2lkeQV1dWlkMioGbm9kZV8yAAAAAAAAAAAMAWICBXV1aWQyAAAFdXVpZDEAAQltYXRjaF9hbGw_gAAAAA==", "keep_alive": "1m" } }
PITs automatically close when their keep_alive
period ends. You can
also manually close PITs you no longer need using the
close PIT API. Closing a PIT releases the
resources needed to maintain the PIT’s index state.
DELETE /_pit { "id" : "46ToAwMDaWR4BXV1aWQxAgZub2RlXzEAAAAAAAAAAAEBYQNpZHkFdXVpZDIrBm5vZGVfMwAAAAAAAAAAKgFjA2lkeQV1dWlkMioGbm9kZV8yAAAAAAAAAAAMAWIBBXV1aWQyAAA=" }
For more information about using PITs in search, see
Paginate search results with
search_after
or the PIT API documentation.
Request-level circuit breakers on coordinating nodes
editYou can now use a coordinating node to account for memory used to perform partial and final reduce of aggregations in the request circuit breaker. The search coordinator adds the memory that it used to save and reduce the results of shard aggregations in the request circuit breaker. Before any partial or final reduce, the memory needed to reduce the aggregations is estimated and a CircuitBreakingException is thrown if exceeds the maximum memory allowed in this breaker.
This size is estimated as roughly 1.5 times the size of the serialized aggregations that need to be reduced. This estimation can be completely off for some aggregations but it is corrected with the real size after the reduce completes. If the reduce is successful, we update the circuit breaker to remove the size of the source aggregations and replace the estimation with the serialized size of the newly reduced result.
EQL: Case-sensitivity and the :
operator
editIn 7.10, we made most EQL operators and functions case-sensitive by default.
We’ve also added :
, a new case-insensitive equal operator. Designed for
security use cases, you can use the :
operator to search for strings in
Windows event logs and other event data containing a mix of letter cases.
GET /my-index-000001/_eql/search { "query": """ process where process.executable : "c:\\\\windows\\\\system32\\\\cmd.exe" """ }
For more information, see the EQL syntax documentation.
REST API access to system indices is deprecated
editWe are deprecating REST API access to system indices. Most REST API requests that attempt to access system indices will return the following deprecation warning:
this request accesses system indices: [.system_index_name], but in a future major version, direct access to system indices will be prevented by default
The following REST API endpoints access system indices as part of their implementation and will not return the deprecation warning:
-
GET _cluster/health
-
GET {index}/_recovery
-
GET _cluster/allocation/explain
-
GET _cluster/state
-
POST _cluster/reroute
-
GET {index}/_stats
-
GET {index}/_segments
-
GET {index}/_shard_stores
-
GET _cat/[indices,aliases,health,recovery,shards,segments]
We are also adding a new metadata flag to track indices. Elasticsearch will automatically add this flag to any existing system indices during upgrade.
New thread pools for system indices
editWe’ve added two new thread pools for system indices: system_read
and
system_write
. These thread pools ensure system indices critical to the Elastic
Stack, such as those used by security or Kibana, remain responsive when
a cluster is under heavy query or indexing load.
system_read
is a fixed
thread pool used to manage resources for
read operations targeting system indices. Similarly, system_write
is a
fixed
thread pool used to manage resources for write operations targeting
system indices. Both have a maximum number of threads equal to 5
or half of the available processors, whichever is smaller.