7.3.0 release highlights
edit7.3.0 release highlights
editVoting-only master nodes
editA new node.voting-only
role has been
introduced that allows nodes to participate in elections even though they are
not eligible to become the master.
The benefit is that these nodes still help with high availability while
requiring less CPU and heap than master nodes.
The node.voting-only
role is only available with the default
distribution of Elasticsearch.
Reloading of search-time synonyms
editA new Analyzer reload API allows to reload the definition of search-time analyzers and their associated resources. A common use-case for this API is the reloading of search-time synonyms. In earlier versions of Elasticsearch, users could force synonyms to be reloaded by closing the index and then opening it again. With this new API, synonyms can be updated without closing the index.
The Analyzer reload API is only available with the default distribution of Elasticsearch.
New flattened
field type
editA new flattened
field type has been added, which can index
arbitrary json
objects into a single field. This helps avoid hitting issues
due to many fields in mappings, at the cost of more limited search
functionality.
The flattened
field type is only available with the
default distribution of Elasticsearch.
Functions on vector fields
editPainless now support computing the
cosine similarity and
the dot product of a
query vector and either values of a
sparse_vector
or
dense_vector
field.
These functions are only available with the default distribution of Elasticsearch.
Prefix and wildcard support for intervals
editIntervals now support querying by prefix or wildcard.
Rare terms aggregation
editA new
Rare Terms aggregation
allows to find the least frequent values in a field. It is intended to replace
the "order" : { "_count" : "asc" }
option of the
terms aggregations.
Aliases are replicated via cross-cluster replication
editRead aliases are now replicated via cross-cluster replication. Note that write aliases are still not replicated since they only make sense for indices that are being written to while follower indices do not receive direct writes.
SQL supports frozen indices
editElasticsearch SQL now supports querying frozen indices via the
new FROZEN
keyword.
Fixed memory leak when using templates in document-level security
editDocument-level security was using an unbounded cache for the set of visible documents. This could lead to a memory leak when using a templated query as a role query. The cache has been fixed to evict based on memory usage and has a limit of 50MB.
More memory-efficient aggregations on keyword
fields
editTerms aggregations generally need to build global ordinals in order to run. Unfortunately this operation became more memory-intensive in 6.0 due to the move to doc-value iterators in order to improve handling of sparse fields. Memory pressure of global ordinals now goes back to a more similar level as what you could have on pre-6.0 releases.
Data frames: transform and pivot your streaming data
edit[beta] This functionality is in beta and is subject to change. The design and code is less mature than official GA features and is being provided as-is with no warranties. Beta features are not subject to the support SLA of official GA features. Transforms are a core new feature in Elasticsearch that enable you to transform an existing index to a secondary, summarized index. Transforms enable you to pivot your data and create entity-centric indices that can summarize the behavior of an entity. This organizes the data into an analysis-friendly format.
Transforms were originally available in 7.2. With 7.3 they can now run either as a single batch transform or continuously incorporating new data as it is ingested.
Data frames enable new possibilities for machine learning analysis (such as outlier detection), but they can also be useful for other types of visualizations and custom types of analysis.
Discover your most unusual data using outlier detection
editThe goal of outlier detection is to find the most unusual data points in an index. We analyse the numerical fields of each data point (document in an index) and annotate them with how unusual they are.
We use unsupervised outlier detection which means there is no need to provide a training data set to teach outlier detection to recognize outliers. In practice, this is achieved by using an ensemble of distance based and density based techniques to identify those data points which are the most different from the bulk of the data in the index. We assign to each analysed data point an outlier score, which captures how different the entity is from other entities in the index.
In addition to new outlier detection functionality, we are introducing the evaluate data frame analytics API, which enables you to compute a range of performance metrics such as confusion matrices, precision, recall, the receiver-operating characteristics (ROC) curve and the area under the ROC curve. If you are running outlier detection on a source index that has already been labeled to indicate which points are truly outliers and which are normal, you can use the evaluate data frame analytics API to assess the performance of the outlier detection analytics on your dataset.