Elasticsearch version 8.16.0

edit

Elasticsearch version 8.16.0

edit

Also see Breaking changes in 8.16.

Breaking changes

edit
Analysis
  • Set lenient to true by default when using updateable synonyms #110901
Data streams
  • Update data stream lifecycle telemetry to track global retention #112451
ES|QL
  • ESQL: Entirely remove META FUNCTIONS #113967
Mapping
  • JDK locale database change #113975
Search
  • Adding breaking change entry for retrievers #115399

Bug fixes

edit
Aggregations
  • Always check the parent breaker with zero bytes in PreallocatedCircuitBreakerService #115181
  • Force using the last centroid during merging #111644 (issue: #111065)
Authentication
  • Check for disabling own user in Put User API #112262 (issue: #90205)
  • Expose cluster-state role mappings in APIs #114951
Authorization
  • Fix DLS & FLS sometimes being enforced when it is disabled #111915 (issue: #94709)
  • Fix DLS using runtime fields and synthetic source #112341
CRUD
  • Don’t fail retention lease sync actions due to capacity constraints #109414 (issue: #105926)
Cluster Coordination
  • Ensure clean thread context in MasterService #114512
Data streams
  • Adding support for data streams with a match-all template #111311 (issue: #111204)
  • Exclude internal data streams from global retention #112100
  • Fix verbose get data stream API not requiring extra privileges #112973
  • OTel mappings: avoid metrics to be rejected when attributes are malformed #114856
  • Resolve pipelines from template on lazy rollover write #116031 (issue: #112781)
  • [apm-data] Apply lazy rollover on index template creation #116219 (issue: #116230)
  • [otel-data] Add more kubernetes aliases #115429
  • logs-apm.error-*: define log.level field as keyword #112440
Distributed
  • Handle InternalSendException inline for non-forking handlers #114375
EQL
  • Fix validation of TEXT fields with case insensitive comparison #111238 (issue: #111235)
ES|QL
  • ESQL: Add Values aggregation tests, fix ConstantBytesRefBlock memory handling #111367
  • ESQL: Align year diffing to the rest of the units in DATE_DIFF: chronological #113103 (issue: #112482)
  • ESQL: Disable pushdown of WHERE past STATS #115308 (issue: #115281)
  • ESQL: Fix CASE when conditions are multivalued #112401 (issue: #112359)
  • ESQL: Fix DEBUG log of filter #116086 (issue: #116055)
  • ESQL: Fix Double operations returning infinite #111064 (issue: #111026)
  • ESQL: Fix REVERSE with backspace character #115245 (issues: #114372, #115227, #115228)
  • ESQL: Fix a bug in VALUES agg #115952
  • ESQL: Fix a bug in MV_PERCENTILE #112218 (issues: #112193, #112180, #112187, #112188)
  • ESQL: Fix filtered grouping on ords #115312 (issue: #114897)
  • ESQL: Fix grammar changes around per agg filtering #114848
  • ESQL: Fix serialization during can_match #111779 (issues: #111701, #111726)
  • ESQL: Fix synthetic attribute pruning #111413 (issue: #105821)
  • ESQL: don’t lose the original casting error message #111968 (issue: #111967)
  • ESQL: fix for missing indices error message #111797 (issue: #111712)
  • ES|QL: Restrict sorting for _source and counter field types #114638 (issues: #114423, #111976)
  • ES|QL: better validation for GROK patterns #110574 (issue: #110533)
  • ES|QL: better validation for RLIKE patterns #112489 (issue: #112485)
  • ES|QL: better validation of GROK patterns #112200 (issue: #112111)
  • ES|QL: fix LIMIT pushdown past MV_EXPAND #115624 (issues: #102084, #102061)
  • Fix ST_CENTROID_AGG when no records are aggregated #114888 (issue: #106025)
  • Spatial search functions support multi-valued fields in compute engine #112063 (issues: #112102, #112505, #110830)
  • [ES|QL] Check expression resolved before checking its data type in ImplicitCasting #113314 (issue: #113242)
  • [ES|QL] Simplify patterns for subfields #111118
  • [ES|QL] Simplify syntax of named parameter for identifier and pattern #115061
  • [ES|QL] Skip validating remote cluster index names in parser #114271
  • [ES|QL] Use RangeQuery and String in BinaryComparison on datetime fields #110669 (issue: #107900)
  • [ES|QL] Verify aggregation filter’s type is boolean to avoid class_cast_exception #116274
  • [ES|QL] add tests for stats by constant #110593 (issue: #105383)
  • [ES|QL] make named parameter for identifier and pattern snapshot #114784
  • [ES|QL] validate mv_sort order #110021 (issue: #109910)
Geo
  • Fix cases of collections with one point #111193 (issue: #110982)
  • Try to simplify geometries that fail with TopologyException #115834
Health
  • Set replica_unassigned_buffer_time in constructor #112612
ILM+SLM
  • Make SnapshotLifecycleStats immutable so SnapshotLifecycleMetadata.EMPTY isn’t changed as side-effect #111215
Indices APIs
  • Align dot prefix validation with Serverless #116266
  • Revert "Add ResolvedExpression wrapper" #115317
Infra/Core
  • Fix max file size check to use getMaxFileSize #113723 (issue: #113705)
  • Guard blob store local directory creation with doPrivileged #115459
  • Handle BigInteger in xcontent copy #111937 (issue: #111812)
  • Report JVM stats for all memory pools (97046) #115117 (issue: #97046)
  • ByteArrayStreamInput: Return -1 when there are no more bytes to read #112214
Infra/Logging
  • Only emit product origin in deprecation log if present #111683 (issue: #81757)
Infra/Settings
  • GET _cluster/settings with include_defaults returns the expected fallback value if defined in elasticsearch.yml #110816 (issue: #110815)
Ingest Node
  • Fix IPinfo geolocation schema #115147
  • Fix getDatabaseType for unusual MMDBs #112888
License
  • Fix Start Trial API output acknowledgement header for features #111740 (issue: #111739)
  • Fix TokenService always appearing used in Feature Usage #112263 (issue: #61956)
  • Fix lingering license warning header in IP filter #115510 (issue: #114865)
Logs
  • Do not expand dots when storing objects in ignored source #113910
  • Fix ignore_above handling in synthetic source when index level setting is used #113570 (issue: #113538)
  • Fix synthetic source for flattened field when used with ignore_above #113499 (issue: #112044)
  • Prohibit changes to index mode, source, and sort settings during restore #115811
Machine Learning
  • Avoid ModelAssignment deadlock #109684
  • Avoid catch (Throwable t) in AmazonBedrockStreamingChatProcessor #115715
  • Allow for pytorch_inference results to include zero-dimensional tensors
  • Empty percentile results no longer throw no_such_element_exception in Anomaly Detection jobs #116015 (issue: #116013)
  • Fix NPE in Get Deployment Stats #115404
  • Fix bug in ML serverless autoscaling which prevented trained model updates from triggering a scale up #110734
  • Fix stream support for TaskType.ANY #115656
  • Fix parameter initialization for large forecasting models #2759
  • Forward bedrock connection errors to user #115868
  • Ignore unrecognized openai sse fields #114715
  • Prevent NPE if model assignment is removed while waiting to start #115430
  • Send mid-stream errors to users #114549
  • Temporarily return both modelId and inferenceId for GET /_inference until we migrate clients to only inferenceId #111490
  • Warn for model load failures if they have a status code <500 #113280
  • [Inference API] Remove unused Cohere rerank service settings fields in a BWC way #110427
  • [ML] Create Inference API will no longer return model_id and now only return inference_id #112508
Mapping
  • Fix MapperBuilderContext#isDataStream when used in dynamic mappers #110554
  • Fix synthetic source field names for multi-fields #112850
  • Retrieve the source for objects and arrays in a separate parsing phase #113027 (issue: #112374)
  • Two empty mappings now are created equally #107936 (issue: #107031)
Ranking
  • Fix MLTQuery handling of custom term frequencies #110846
  • Fix RRF validation for rank_constant < 1 #112058
  • Fix score count validation in reranker response #111212 (issue: #111202)
Search
  • Allow for querries on _tier to skip shards in the can_match phase #114990 (issue: #114910)
  • Allow out of range term queries for numeric types #112916
  • Do not exclude empty arrays or empty objects in source filtering #112250 (issue: #109668)
  • Fix synthetic source handling for bit type in dense_vector field #114407 (issue: #114402)
  • Improve DateTime error handling and add some bad date tests #112723 (issue: #112190)
  • Improve date expression/remote handling in index names #112405 (issue: #112243)
  • Make "too many clauses" throw IllegalArgumentException to avoid 500s #112678 (issue: #112177)
  • Make empty string searches be consistent with case (in)sensitivity #110833
  • Prevent flattening of ordered and unordered interval sources #114234
  • Remove needless forking to GENERIC in TransportMultiSearchAction #110796
  • Search/Mapping: KnnVectorQueryBuilder support for allowUnmappedFields #107047 (issue: #106846)
  • Span term query to convert to match no docs when unmapped field is targeted #113251
  • Speedup CanMatchPreFilterSearchPhase constructor #110860
  • Update BlobCacheBufferedIndexInput::readVLong to correctly handle negative long values #115594
  • [8.x] Limit the number of tasks that a single search can submit #115932
Security
  • Add ECK Role Mapping Cleanup #115823
  • Updated the transport CA name in Security Auto-Configuration. #106520 (issue: #106455)
Snapshot/Restore
TSDB
  • Implement parseBytesRef for TimeSeriesRoutingHashFieldType #113373 (issue: #112399)
Task Management
  • Improve handling of failure to create persistent task #114386
Transform
  • Allow task canceling of validate API calls #110951
  • Include reason when no nodes are found #112409 (issue: #112404)
Vector Search
  • Fix dim validation for bit element_type #114533
  • Support semantic_text in object fields #114601 (issue: #114401)
Watcher
  • Truncating watcher history if it is too large #111245 (issue: #94745)

Deprecations

edit
Analysis
  • Deprecate dutch_kp and lovins stemmer as they are removed in Lucene 10 #113143
  • deprecate edge_ngram side parameter #110829
CRUD
  • Deprecate dot-prefixed indices and composable template index patterns #112571
Search
  • Adding deprecation warnings for rrf using rank and sub_searches #114854
  • Deprecate legacy params from range query #113286

Enhancements

edit
Aggregations
  • Account for DelayedBucket before reduction #113013
  • Add protection for OOM during aggregations partial reduction #110520
  • Deduplicate BucketOrder when deserializing #112707
  • Lower the memory footprint when creating DelayedBucket #112519
  • Reduce heap usage for AggregatorsReducer #112874
  • Remove reduce and reduceContext from DelayedBucket #112547
Allocation
  • Add link to flood-stage watermark exception message #111315
  • Always allow rebalancing by default #111015
Application
  • [Profiling] add container.id field to event index template #111969
Authorization
  • Add manage roles privilege #110633
  • Add privileges required for CDR misconfiguration features to work on AWS SecurityHub integration #112574
Codec
  • Remove zstd feature flag for index codec best compression #112665
  • [8.x] Remove zstd feature flag for index codec best compression #112857
Data streams
  • Add verbose flag retrieving maximum_timestamp for get data stream API #112303
  • Display effective retention in the relevant data stream APIs #112019
  • Expose global retention settings via data stream lifecycle API #112210
  • Ignore warning on yaml test put template #116201 (issue: #116158)
  • Make ecs@mappings work with OTel attributes #111600
Distributed
  • Add link to Max Shards Per Node exception message #110993
EQL
  • ESQL: Delay construction of warnings #114368
ES|QL
  • Add EXP ES|QL function #110879
  • Add CircuitBreaker to TDigest, Step 3: Connect with ESQL CB #113387
  • Add CircuitBreaker to TDigest, Step 4: Take into account shallow classes size #113613 (issue: #113916)
  • Collect and display execution metadata for ES|QL cross cluster searches #112595 (issue: #112402)
  • ESQL: Add support for multivalue fields in Arrow output #114774
  • ESQL: BUCKET: allow numerical spans as whole numbers #111874 (issues: #104646, #109340, #105375)
  • ESQL: Have BUCKET generate friendlier intervals #111879 (issue: #110916)
  • ESQL: Profile more timing information #111855
  • ESQL: Push down filters even in case of renames in Evals #114411
  • ESQL: Speed up CASE for some parameters #112295
  • ESQL: Speed up grouping by bytes #114021
  • ESQL: Support INLINESTATS grouped on expressions #111690
  • ESQL: Use less memory in listener #114358
  • ES|QL: Add support for cached strings in plan serialization #112929
  • ES|QL: add Telemetry API and track top functions #111226
  • Enhance SORT push-down to Lucene to cover references to fields and ST_DISTANCE function #112938 (issue: #109973)
  • Siem ea 9521 improve test #111552
  • Support multi-valued fields in compute engine for ST_DISTANCE #114836 (issue: #112910)
  • [ESQL] Add SPACE function #112350
  • [ESQL] Add finish() elapsed time to aggregation profiling times #113172 (issue: #112950)
  • [ESQL] Make query wrapped by SingleValueQuery cacheable #110116
  • [ES|QL] Add hypot function #114382
  • [ES|QL] Cast mixed numeric types to a common numeric type for Coalesce and In at Analyzer #111917 (issue: #111486)
  • [ES|QL] Combine Disjunctive CIDRMatch #111501 (issue: #105143)
  • [ES|QL] Create Range in PushFiltersToSource for qualified pushable filters on the same field #111437
  • [ES|QL] Name parameter with leading underscore #111950 (issue: #111821)
  • [ES|QL] Named parameter for field names and field name patterns #112905
  • [ES|QL] Validate index name in parser #112081
  • [ES|QL] add reverse function #113297
  • [ES|QL] explicit cast a string literal to date_period and time_duration in arithmetic operations #109193
Experiences
  • Integrate IBM watsonx to Inference API for text embeddings #111770
Geo
  • Add support for spatial relationships in point field mapper #112126
  • Small performance improvement in h3 library #113385
  • Support docvalues only query in shape field #112199
Health
  • (API) Cluster Health report unassigned_primary_shards #112024
  • Do not treat replica as unassigned if primary recently created and unassigned time is below a threshold #112066
ILM+SLM
  • ILM: Add total_shards_per_node setting to searchable snapshot #112972 (issue: #112261)
  • PUT slm policy should only increase version if actually changed #111079
  • Preserve Step Info Across ILM Auto Retries #113187
  • Register SLM run before snapshotting to save stats #110216
  • SLM interval schedule followup - add back getFieldName style getters #112123
Infra/Core
  • Add nanos support to ZonedDateTime serialization #111689 (issue: #68292)
  • Extend logging for dropped warning headers #111624 (issue: #90527)
  • Give the kibana system user permission to read security entities #114363
Infra/Metrics
  • Add TaskManager to pluginServices #112687
Infra/REST API
  • Optimize the loop processing of URL decoding #110237 (issue: #110235)
Infra/Scripting
  • Expose HexFormat in Painless #112412
Infra/Settings
  • Improve exception message for bad environment variable placeholders in settings #114552 (issue: #110858)
  • Reprocess operator file settings when settings service starts, due to node restart or master node change #114295
Ingest Node
  • Add size_in_bytes to enrich cache stats #110578
  • Add support for templates when validating mappings in the simulate ingest API #111161
  • Adding index_template_substitutions to the simulate ingest API #114128
  • Adding component template substitutions to the simulate ingest API #113276
  • Adding mapping validation to the simulate ingest API #110606
  • Adds example plugin for custom ingest processor #112282 (issue: #111539)
  • Fix unnecessary mustache template evaluation #110986 (issue: #110191)
  • Listing all available databases in the _ingest/geoip/database API #113498
  • Make enrich cache based on memory usage #111412 (issue: #106081)
  • Tag redacted document in ingest metadata #113552
  • Verify Maxmind database types in the geoip processor #114527
Logs
  • Add validation for synthetic source mode in logs mode indices #110677
  • Store original source for keywords using a normalizer #112151
Machine Learning
  • Add Completion Inference API for Alibaba Cloud AI Search Model #112512
  • Add Streaming Inference spec #113812
  • Add chunking settings configuration to CohereService, AmazonBedrockService, and AzureOpenAiService #113897
  • Add chunking settings configuration to ElasticsearchService/ELSER #114429
  • Add custom rule parameters to force time shift #110974
  • Adding chunking settings to GoogleVertexAiService, AzureAiStudioService, and AlibabaCloudSearchService #113981
  • Adding chunking settings to MistralService, GoogleAiStudioService, and HuggingFaceService #113623
  • Adds a new Inference API for streaming responses back to the user. #113158
  • Allow users to force a detector to shift time series state by a specific amount #2695
  • Create StreamingHttpResultPublisher #112026
  • Create an ml node inference endpoint referencing an existing model #114750
  • Default inference endpoint for ELSER #113873
  • Default inference endpoint for the multilingual-e5-small model #114683
  • Dynamically get of num allocations #114636
  • Enable OpenAI Streaming #113911
  • Filter empty task settings objects from the API response #114389
  • Migrate Inference to ChunkedToXContent #111655
  • Register Task while Streaming #112369
  • Server-Sent Events for Inference response #112565
  • Stream Anthropic Completion #114321
  • Stream Azure Completion #114464
  • Stream Bedrock Completion #114732
  • Stream Cohere Completion #114080
  • Stream Google Completion #114596
  • Stream OpenAI Completion #112677
  • Support sparse embedding models in the elasticsearch inference service #112270
  • Switch default chunking strategy to sentence #114453
  • Update the Pytorch library to version 2.3.1 #2688
  • Upgrade to AWS SDK v2 #114309 (issue: #110590)
  • Use the same chunking configurations for models in the Elasticsearch service #111336
  • Validate streaming HTTP Response #112481
  • Wait for allocation on scale up #114719
  • [Inference API] Add Alibaba Cloud AI Search Model support to Inference API #111181
  • [Inference API] Add Docs for AlibabaCloud AI Search Support for the Inference API #111181
  • [Inference API] Introduce Update API to change some aspects of existing inference endpoints #114457
  • [Inference API] Prevent inference endpoints from being deleted if they are referenced by semantic text #110399
  • [Inference API] alibabacloud ai search service support chunk infer to support semantic_text field #110399
Mapping
  • Add Field caps support for Semantic Text #111809
  • Add Lucene segment-level fields stats #111123
  • Add Search Inference ID To Semantic Text Mapping #113051
  • Add object param for keeping synthetic source #113690
  • Add support for multi-value dimensions #112645 (issue: #110387)
  • Allow dimension fields to have multiple values in standard and logsdb index mode #112345 (issues: #112232, #112239)
  • Allow fields with dots in sparse vector field mapper #111981 (issue: #109118)
  • Allow querying index_mode #110676
  • Configure keeping source in FieldMapper #112706
  • Control storing array source with index setting #112397
  • Introduce mode subobjects=auto for objects #110524
  • Update semantic_text field to support indexing numeric and boolean data types #111284
  • Use fallback synthetic source for copy_to and doc_values: false cases #112294 (issues: #110753, #110038, #109546)
Network
  • Add links to network disconnect troubleshooting #112330
Ranking
  • Add timeout and cancellation check to rescore phase #115048
Relevance
  • Add a query rules tester API call #114168
Search
  • Add more dense_vector details for cluster stats field stats #113607
  • Add range and regexp Intervals #111465
  • Adding support for allow_partial_search_results in PIT #111516
  • Allow incubating Panama Vector in simdvec, and add vectorized ipByteBin #112933
  • Avoid using concurrent collector manager in LuceneChangesSnapshot #113816
  • Bool query early termination should also consider must_not clauses #115031
  • Deduplicate Kuromoji User Dictionary #112768
  • Multi term intervals: increase max_expansions #112826 (issue: #110491)
  • Search coordinator uses event.ingested in cluster state to do rewrites #111523
  • Update cluster stats for retrievers #114109
Security
  • (logger) change from error to warn for short circuiting user #112895
  • Add asset criticality indices for kibana_system_user #113588
  • Add tier preference to security index settings allowlist #111818
  • [Service Account] Add AutoOps account #111316
Snapshot/Restore
  • Add max_multipart_parts setting to S3 repository #113989
  • Add support for Azure Managed Identity #111344
  • Add telemetry for repository usage #112133
  • Add workaround for missing shard gen blob #112337
  • Clean up dangling S3 multipart uploads #111955 (issues: #101169, #44971)
  • Execute shard snapshot tasks in shard-id order #111576 (issue: #108739)
  • Include account name in Azure settings exceptions #111274
  • Introduce repository integrity verification API #112348 (issue: #52622)
Stats
  • Track search and fetch failure stats #113988
TSDB
  • Add support for boolean dimensions #111457 (issue: #111338)
  • Stop iterating over all fields to extract @timestamp value #110603 (issue: #92297)
  • Support booleans in routing path #111445
Vector Search
  • Dense vector field types updatable for int4 #110928
  • Use native scalar scorer for int8_flat index #111071

New features

edit
Data streams
  • Introduce global retention in data stream lifecycle. #111972
  • X-pack/plugin/otel: introduce x-pack-otel plugin #111091
ES|QL
  • Add ESQL match function #113374
  • ESQL: Add MV_PSERIES_WEIGHTED_SUM for score calculations used by security solution #109017
  • ESQL: Add async ID and is_running headers to ESQL async query #111840
  • ESQL: Add boolean support to Max and Min aggs #110527
  • ESQL: Add boolean support to TOP aggregation #110718
  • ESQL: Added mv_percentile function #111749 (issue: #111591)
  • ESQL: INLINESTATS #109583 (issue: #107589)
  • ESQL: Introduce per agg filter #113735
  • ESQL: Strings support for MAX and MIN aggregations #111544
  • ESQL: Support IP fields in MAX and MIN aggregations #110921
  • ESQL: TOP aggregation IP support #111105
  • ESQL: TOP support for strings #113183 (issue: #109849)
  • ESQL: mv_median_absolute_deviation function #112055 (issue: #111590)
  • Search in ES|QL: Add MATCH operator #110971
ILM+SLM
  • SLM Interval based scheduling #110847
Inference
Ingest Node
Machine Learning
  • Inference autoscaling #109667
  • Telemetry for inference adaptive allocations #110630
Relevance
  • [Query rules] Add exclude query rule type #111420
Search
  • Async search: Add ID and "is running" http headers #112431 (issue: #109576)
  • Cross-cluster search telemetry #113825
Vector Search
  • Adding new bbq index types behind a feature flag #114439

Upgrades

edit
Infra/Core
  • Upgrade xcontent to Jackson 2.17.0 #111948
  • Upgrade xcontent to Jackson 2.17.2 #112320
Infra/Metrics
Search
Snapshot/Restore
  • Upgrade Azure SDK #111225
  • Upgrade repository-azure dependencies #112277