Elasticsearch version 7.9.0
editElasticsearch version 7.9.0
editAlso see Breaking changes in 7.9.
Security updates
edit- A field disclosure flaw was found in Elasticsearch when running a scrolling search with field level security. If a user runs the same query another more privileged user recently ran, the scrolling search can leak fields that should be hidden. This could result in an attacker gaining additional permissions against a restricted index. All versions of Elasticsearch before 7.9.0 and 6.8.12 are affected by this flaw. You must upgrade to Elasticsearch version 7.9.0 or 6.8.12 to obtain the fix. CVE-2020-7019
Known issues
edit- Upgrading to 7.9.0 from an earlier version will result in incorrect mappings on the machine learning annotations index, and possibly also on the machine learning config index. This will lead to some pages in the machine learning UI not displaying correctly, and may prevent machine learning jobs being created or updated. The best way to avoid this problem if you read about this known issue before upgrading is to manually update the mappings on these indices in your old Elasticsearch version before upgrading to 7.9.0. If you find out about the issue after upgrading then reindexing is required to recover. Full details of the mitigations are in Upgrade to 7.9.0 causes incorrect mappings.
- Lucene 8.6.0, on which Elasticsearch 7.9.0 is based, contains a memory leak. This memory leak manifests in Elasticsearch when a single document is updated repeatedly with a forced refresh. The cluster state storage layer in Elasticsearch is based on Lucene and does use single-document updates with forced refreshes, meaning that this memory leak manifests in Elasticsearch under normal conditions. It also manifests when user-controlled workloads update a single document in an index repeatedly with a forced refresh. In both cases, the memory leak is around 500 bytes per update, so it does take some time for the leak to show any meaningful impact on the system. Symptoms of this memory leak are the size of the used heap slowly rising over time, requests eventually being rejected by the real memory circuit breaker, and potentially out-of-memory errors. A workaround is to restart any nodes exhibiting these symptoms. We are actively working with the Lucene community to release a fix in Lucene 8.6.2 to deliver in Elasticsearch 7.9.1 that will address this memory leak.
-
SQL: If a
WHERE
clause contains at least two relational operators joined byAND
, of which one is a comparison (<=
,<
,>=
,>
) and another one is an inequality (!=
,<>
), both against literals or foldable expressions, the inequality will be ignored. The workaround is to substitute the inequality with aNOT IN
operator.We have fixed this issue in Elasticsearch 7.10.1 and later versions. For more details, see #65488.
-
Snapshot and restore: If an index is deleted while the cluster is concurrently taking more than one snapshot then there is a risk that one of the snapshots may never complete and also that some shard data may be lost from the repository, causing future restore operations to fail. To mitigate this problem, prevent concurrent snapshot operations by setting
snapshot.max_concurrent_operations: 1
.This issue is fixed in Elasticsearch versions 7.13.1 and later. For more details, see #73456.
Breaking changes
edit- Script Cache
- Field capabilities API
- Snapshot restore throttling
-
-
Restoring from a snapshot (which is a particular form of recovery) is now
properly taking recovery throttling into account (i.e. the
indices.recovery.max_bytes_per_sec
setting). Themax_restore_bytes_per_sec
setting is also now defaulting to unlimited, whereas previously it was set to40mb
, which is the default that’s used forindices.recovery.max_bytes_per_sec
. This means that no behavioral change will be observed by clusters where the recovery and restore settings had not been adapted from the defaults. #58658
-
Restoring from a snapshot (which is a particular form of recovery) is now
properly taking recovery throttling into account (i.e. the
- Thread pool write queue size
-
-
The WRITE thread pool default queue size (
thread_pool.write.size
) has been increased from 200 to 10000. A small queue size (200) caused issues when users wanted to send small indexing requests with a high client count. Additional memory-oriented back pressure has been introduced with theindexing_pressure.memory.limit
setting. This setting configures a limit to the number of bytes allowed to be consumed by outstanding indexing requests. #59263
-
The WRITE thread pool default queue size (
- Dangling indices
-
- Automatically importing dangling indices is now deprecated, disabled by default, and will be removed in Elasticsearch 8.0. See the migration notes. #58176 #58898 (issue: #48366)
Breaking Java changes
edit- Aggregations
- Features/Ingest
New features
edit- Aggregations
-
- Add moving percentiles pipeline aggregation #55441 (issue: #49452)
- Add normalize pipeline aggregation #56399 (issue: #51005)
- Add variable width histogram aggregation #42035 (issues: #9572, #50863)
- Add pipeline inference aggregation #58193
- Speed up time interval arounding around daylight savings time (DST) #56371 (issue: #55559)
- Geo
- Machine Learning
-
- Add update data frame analytics jobs API #58302 (issue: #45720)
- Introduce model_plot_config.annotations_enabled setting for anomaly detection jobs #57539 (issue: #55781)
- Report significant changes to anomaly detection models in annotations of the results #1247, #56342, #56417, #57144, #57278, #57539
- Mapping
- Search
Enhancements
edit- Aggregations
-
- Add support for numeric range keys #56452 (issue: #56402)
- Added standard deviation / variance sampling to extended stats #49782 (issue: #49554)
- Give significance lookups their own home #57903
- Increase search.max_buckets to 65,535 #57042 (issue: #51731)
- Optimize date_histograms across daylight savings time #55559
- Return clear error message if aggregation type is invalid #58255 (issue: #58146)
- Save memory on numeric significant terms when not top #56789 (issue: #55873)
- Save memory when auto_date_histogram is not on top #57304 (issue: #56487)
- Save memory when date_histogram is not on top #56921 (issues: #55873, #56487)
- Save memory when histogram agg is not on top #57277
- Save memory when numeric terms agg is not top #55873
- Save memory when parent and child are not on top #57892 (issue: #55873)
- Save memory when rare_terms is not on top #57948 (issue: #55873)
- Save memory when significant_text is not on top #58145 (issue: #55873)
- Save memory when string terms are not on top #57758
- Speed up reducing auto_date_histo with a time zone #57933 (issue: #56124)
- Speed up rounding in auto_date_histogram #56384 (issue: #55559)
- Allocation
-
- Account for remaining recovery in disk allocator #58029
- Analysis
- Authentication
- Authorization
- CCR
-
- Allow follower indices to override leader settings #58103
- CRUD
-
- Retry failed replication due to transient errors #55633
- Engine
- Features/Data streams
-
- Add support for snapshot and restore to data streams #57675 (issues: #53100, #57127)
- Data stream creation validation allows for prefixed indices #57750 (issue: #53100)
- Disallow deletion of composable template if in use by data stream #57957 (issue: #57004)
- Validate alias operations don’t target data streams #58327 (issue: #53100)
- Features/ILM+SLM
- Features/Indices APIs
-
- Add default composable templates for new indexing strategy #57629 (issue: #56709)
- Add index block api #58094
- Add new flag to check whether alias exists on remove #58100
- Add prefer_v2_templates parameter to reindex #56253 (issue: #53101)
- Add template simulation API for simulating template composition #56842 (issues: #53101, #55686, #56255, #56390)
- Features/Ingest
- Features/Java High Level REST Client
- Features/Java Low Level REST Client
- Infra/Circuit Breakers
- Infra/Core
-
- Introduce node.roles setting #54998
- Infra/Packaging
- Infra/Plugins
-
- Improved ExtensiblePlugin #58234
- Infra/Resiliency
- Machine Learning
-
- Accounting for model size when models are not cached. #58670
- Adds new for_export flag to GET _ml/inference API #57351
- Adds WKT geometry detection in find_file_structure #57014 (issue: #56967)
- Calculate cache misses for inference and return in stats #58252
- Delete auto-generated annotations when job is deleted. #58169 (issue: #57976)
- Delete auto-generated annotations when model snapshot is reverted #58240 (issue: #57982)
- Delete expired data by job #57337
- Introduce Annotation.event field #57144 (issue: #55781)
- Add support for larger forecasts in memory via max_model_memory setting #1238, #57254
- Don’t lose precision when saving model state #1274
- Parallelize the feature importance calculation for classification and regression over trees #1277
- Add an option to do categorization independently for each partition #1293, #1318, #1356, #57683
- Memory usage is reported during job initialization #1294
- More realistic memory estimation for classification and regression means that these analyses will require lower memory limits than before #1298
- Checkpoint state to allow efficient failover during coarse parameter search for classification and regression #1300
- Improve data access patterns to speed up classification and regression #1312
- Performance improvements for classification and regression, particularly running multithreaded #1317
- Improve runtime and memory usage training deep trees for classification and regression #1340
- Improvement in handling large inference model definitions #1349
- Add a peak_model_bytes field to model_size_stats #1389
- Mapping
- Network
- Recovery
- Reindex
- SQL
-
- Implement TIME_PARSE function for parsing strings into TIME values #55223 (issues: #54963, #55095)
- Implement TOP as an alternative to LIMIT #57428 (issue: #41195)
- Implement TRIM function #57518 (issue: #41195)
- Improve performances of LTRIM/RTRIM #57603 (issue: #57594)
- Make CASTing string to DATETIME more lenient #57451
- Redact credentials in connection exceptions #58650 (issue: #56474)
- Relax parsing of date/time escaped literals #58336 (issue: #58262)
- Add support for scalars within LIKE/RLIKE #56495 (issue: #55058)
- Search
-
- Add description to submit and get async search, as well as cancel tasks #57745
- Add matchBoolPrefix static method in query builders #58637 (issue: #58388)
- Add range query support to wildcard field #57881 (issue: #57816)
- Group docIds by segment in FetchPhase to better use LRU cache #57273
- Improve error handling when decoding async execution ids #56285
- Specify reason whenever async search gets cancelled #57761
- Use index sort range query when possible. #56657 (issue: #48665)
- Security
- Snapshot/Restore
-
- Deduplicate Index Metadata in BlobStore #50278 (issues: #45736, #46250, #49800)
- Default to zero replicas for searchable snapshots #57802 (issue: #50999)
- Enable fully concurrent snapshot operations #56911
- Support cloning of searchable snapshot indices #56595
- Track GET/LIST Azure Storage API calls #56773
- Track GET/LIST GoogleCloudStorage API calls #56585
- Track PUT/PUT_BLOCK operations on AzureBlobStore. #56936
- Track multipart/resumable uploads GCS API calls #56821
- Track upload requests on S3 repositories #56826
- Task Management
- Transform
Bug fixes
edit- Aggregations
- Allocation
- Authentication
-
- Map only specific type of OIDC Claims #58524
- Authorization
- Engine
- Features/ILM+SLM
- Features/Indices APIs
- Features/Ingest
- Geo
- Infra/Scripting
- Machine Learning
-
- Fix wire serialization for flush acknowledgements #58413
- Make waiting for renormalization optional for internally flushing job #58537 (issue: #58395)
- Tail the C++ logging pipe before connecting other pipes #56632 (issue: #56366)
- Fix numerical issues leading to blow up of the model plot bounds #1268
- Fix causes for inverted forecast confidence interval bounds #1369 (issue: #1357)
- Restrict growth of max matching string length for categories #1406
- Mapping
- SQL
- Search
- Snapshot/Restore
Upgrades
edit- Search
-
- Update to lucene snapshot e7c625430ed #57981