Elasticsearch API
http://api.example.com
Elasticsearch provides REST APIs that are used by the UI components and can be called directly to configure and access Elasticsearch features.
Documentation source and versions
This documentation is derived from the main
branch of the elasticsearch-specification repository. It is provided under license Attribution-NonCommercial-NoDerivatives 4.0 International.
This documentation contains work-in-progress information for future Elastic Stack releases.
Last update on Apr 16, 2025.
This API is provided under license Apache 2.0.
Create a behavioral analytics collection
Deprecated
Technical preview
Path parameters
-
name
string Required The name of the analytics collection to be created or updated.
curl \
--request PUT 'http://api.example.com/_application/analytics/{name}' \
--header "Authorization: $API_KEY"
{
"acknowledged": true,
"name": "string"
}
Get aliases
Get the cluster's index aliases, including filter and routing information. This API does not return data stream aliases.
IMPORTANT: CAT APIs are only intended for human consumption using the command line or the Kibana console. They are not intended for use by applications. For application consumption, use the aliases API.
Path parameters
-
name
string | array[string] Required A comma-separated list of aliases to retrieve. Supports wildcards (
*
). To retrieve all aliases, omit this parameter or use*
or_all
.
Query parameters
-
h
string | array[string] List of columns to appear in the response. Supports simple wildcards.
-
s
string | array[string] List of columns that determine how the table should be sorted. Sorting defaults to ascending and can be changed by setting
:asc
or:desc
as a suffix to the column name. -
expand_wildcards
string | array[string] The type of index that wildcard patterns can match. If the request can target data streams, this argument determines whether wildcard expressions match hidden data streams. It supports comma-separated values, such as
open,hidden
. -
master_timeout
string The period to wait for a connection to the master node. If the master node is not available before the timeout expires, the request fails and returns an error. To indicated that the request should never timeout, you can set it to
-1
.
curl \
--request GET 'http://api.example.com/_cat/aliases/{name}' \
--header "Authorization: $API_KEY"
[
{
"alias": "alias1",
"index": "test1",
"filter": "-",
"routing.index": "-",
"routing.search": "-",
"is_write_index": "true"
},
{
"alias": "alias1",
"index": "test1",
"filter": "*",
"routing.index": "-",
"routing.search": "-",
"is_write_index": "true"
},
{
"alias": "alias3",
"index": "test1",
"filter": "-",
"routing.index": "1",
"routing.search": "1",
"is_write_index": "true"
},
{
"alias": "alias4",
"index": "test1",
"filter": "-",
"routing.index": "2",
"routing.search": "1,2",
"is_write_index": "true"
}
]
Get a document count
Get quick access to a document count for a data stream, an index, or an entire cluster. The document count only includes live documents, not deleted documents which have not yet been removed by the merge process.
IMPORTANT: CAT APIs are only intended for human consumption using the command line or Kibana console. They are not intended for use by applications. For application consumption, use the count API.
curl \
--request GET 'http://api.example.com/_cat/count' \
--header "Authorization: $API_KEY"
[
{
"epoch": "1475868259",
"timestamp": "15:24:20",
"count": "120"
}
]
[
{
"epoch": "1475868259",
"timestamp": "15:24:20",
"count": "121"
}
]
Get datafeeds
Added in 7.7.0
Get configuration and usage information about datafeeds.
This API returns a maximum of 10,000 datafeeds.
If the Elasticsearch security features are enabled, you must have monitor_ml
, monitor
, manage_ml
, or manage
cluster privileges to use this API.
IMPORTANT: CAT APIs are only intended for human consumption using the Kibana console or command line. They are not intended for use by applications. For application consumption, use the get datafeed statistics API.
Path parameters
-
datafeed_id
string Required A numerical character string that uniquely identifies the datafeed.
Query parameters
-
allow_no_match
boolean Specifies what to do when the request:
- Contains wildcard expressions and there are no datafeeds that match.
- Contains the
_all
string or no identifiers and there are no matches. - Contains wildcard expressions and there are only partial matches.
If
true
, the API returns an empty datafeeds array when there are no matches and the subset of results when there are partial matches. Iffalse
, the API returns a 404 status code when there are no matches or only partial matches. -
h
string | array[string] Comma-separated list of column names to display.
-
s
string | array[string] Comma-separated list of column names or column aliases used to sort the response.
-
time
string The unit used to display time values.
Values are
nanos
,micros
,ms
,s
,m
,h
, ord
.
curl \
--request GET 'http://api.example.com/_cat/ml/datafeeds/{datafeed_id}' \
--header "Authorization: $API_KEY"
[
{
"id": "datafeed-high_sum_total_sales",
"state": "stopped",
"buckets.count": "743",
"search.count": "7"
},
{
"id": "datafeed-low_request_rate",
"state": "stopped",
"buckets.count": "1457",
"search.count": "3"
},
{
"id": "datafeed-response_code_rates",
"state": "stopped",
"buckets.count": "1460",
"search.count": "18"
},
{
"id": "datafeed-url_scanning",
"state": "stopped",
"buckets.count": "1460",
"search.count": "18"
}
]
Get index template information
Added in 5.2.0
Get information about the index templates in a cluster. You can use index templates to apply index settings and field mappings to new indices at creation. IMPORTANT: cat APIs are only intended for human consumption using the command line or Kibana console. They are not intended for use by applications. For application consumption, use the get index template API.
Path parameters
-
name
string Required The name of the template to return. Accepts wildcard expressions. If omitted, all templates are returned.
Query parameters
-
h
string | array[string] List of columns to appear in the response. Supports simple wildcards.
-
s
string | array[string] List of columns that determine how the table should be sorted. Sorting defaults to ascending and can be changed by setting
:asc
or:desc
as a suffix to the column name. -
local
boolean If
true
, the request computes the list of selected nodes from the local cluster state. Iffalse
the list of selected nodes are computed from the cluster state of the master node. In both cases the coordinating node will send requests for further information to each selected node. -
master_timeout
string Period to wait for a connection to the master node.
curl \
--request GET 'http://api.example.com/_cat/templates/{name}' \
--header "Authorization: $API_KEY"
[
{
"name": "my-template-0",
"index_patterns": "[te*]",
"order": "500",
"version": null,
"composed_of": "[]"
},
{
"name": "my-template-1",
"index_patterns": "[tea*]",
"order": "501",
"version": null,
"composed_of": "[]"
},
{
"name": "my-template-2",
"index_patterns": "[teak*]",
"order": "502",
"version": "7",
"composed_of": "[]"
}
]
Update voting configuration exclusions
Added in 7.0.0
Update the cluster voting config exclusions by node IDs or node names. By default, if there are more than three master-eligible nodes in the cluster and you remove fewer than half of the master-eligible nodes in the cluster at once, the voting configuration automatically shrinks. If you want to shrink the voting configuration to contain fewer than three nodes or to remove half or more of the master-eligible nodes in the cluster at once, use this API to remove departing nodes from the voting configuration manually. The API adds an entry for each specified node to the cluster’s voting configuration exclusions list. It then waits until the cluster has reconfigured its voting configuration to exclude the specified nodes.
Clusters should have no voting configuration exclusions in normal operation.
Once the excluded nodes have stopped, clear the voting configuration exclusions with DELETE /_cluster/voting_config_exclusions
.
This API waits for the nodes to be fully removed from the cluster before it returns.
If your cluster has voting configuration exclusions for nodes that you no longer intend to remove, use DELETE /_cluster/voting_config_exclusions?wait_for_removal=false
to clear the voting configuration exclusions without waiting for the nodes to leave the cluster.
A response to POST /_cluster/voting_config_exclusions
with an HTTP status code of 200 OK guarantees that the node has been removed from the voting configuration and will not be reinstated until the voting configuration exclusions are cleared by calling DELETE /_cluster/voting_config_exclusions
.
If the call to POST /_cluster/voting_config_exclusions
fails or returns a response with an HTTP status code other than 200 OK then the node may not have been removed from the voting configuration.
In that case, you may safely retry the call.
NOTE: Voting exclusions are required only when you remove at least half of the master-eligible nodes from a cluster in a short time period. They are not required when removing master-ineligible nodes or when removing fewer than half of the master-eligible nodes.
Query parameters
-
node_names
string | array[string] A comma-separated list of the names of the nodes to exclude from the voting configuration. If specified, you may not also specify node_ids.
-
node_ids
string | array[string] A comma-separated list of the persistent ids of the nodes to exclude from the voting configuration. If specified, you may not also specify node_names.
-
master_timeout
string Period to wait for a connection to the master node.
-
timeout
string When adding a voting configuration exclusion, the API waits for the specified nodes to be excluded from the voting configuration before returning. If the timeout expires before the appropriate condition is satisfied, the request fails and returns an error.
curl \
--request POST 'http://api.example.com/_cluster/voting_config_exclusions' \
--header "Authorization: $API_KEY"
Activate the connector draft filter
Technical preview
Activates the valid draft filtering for a connector.
Path parameters
-
connector_id
string Required The unique identifier of the connector to be updated
curl \
--request PUT 'http://api.example.com/_connector/{connector_id}/_filtering/_activate' \
--header "Authorization: $API_KEY"
{
"result": "created"
}
Get data stream lifecycles
Added in 8.11.0
Get the data stream lifecycle configuration of one or more data streams.
Path parameters
-
name
string | array[string] Required Comma-separated list of data streams to limit the request. Supports wildcards (
*
). To target all data streams, omit this parameter or use*
or_all
.
Query parameters
-
expand_wildcards
string | array[string] Type of data stream that wildcard patterns can match. Supports comma-separated values, such as
open,hidden
. Valid values are:all
,open
,closed
,hidden
,none
. -
include_defaults
boolean If
true
, return all default settings in the response. -
master_timeout
string Period to wait for a connection to the master node. If no response is received before the timeout expires, the request fails and returns an error.
curl \
--request GET 'http://api.example.com/_data_stream/{name}/_lifecycle' \
--header "Authorization: $API_KEY"
{
"data_streams": [
{
"name": "my-data-stream-1",
"lifecycle": {
"enabled": true,
"data_retention": "7d"
}
},
{
"name": "my-data-stream-2",
"lifecycle": {
"enabled": true,
"data_retention": "7d"
}
}
]
}
Reindex documents
Added in 2.3.0
Copy documents from a source to a destination. You can copy all documents to the destination index or reindex a subset of the documents. The source can be any existing index, alias, or data stream. The destination must differ from the source. For example, you cannot reindex a data stream into itself.
IMPORTANT: Reindex requires _source
to be enabled for all documents in the source.
The destination should be configured as wanted before calling the reindex API.
Reindex does not copy the settings from the source or its associated template.
Mappings, shard counts, and replicas, for example, must be configured ahead of time.
If the Elasticsearch security features are enabled, you must have the following security privileges:
- The
read
index privilege for the source data stream, index, or alias. - The
write
index privilege for the destination data stream, index, or index alias. - To automatically create a data stream or index with a reindex API request, you must have the
auto_configure
,create_index
, ormanage
index privilege for the destination data stream, index, or alias. - If reindexing from a remote cluster, the
source.remote.user
must have themonitor
cluster privilege and theread
index privilege for the source data stream, index, or alias.
If reindexing from a remote cluster, you must explicitly allow the remote host in the reindex.remote.whitelist
setting.
Automatic data stream creation requires a matching index template with data stream enabled.
The dest
element can be configured like the index API to control optimistic concurrency control.
Omitting version_type
or setting it to internal
causes Elasticsearch to blindly dump documents into the destination, overwriting any that happen to have the same ID.
Setting version_type
to external
causes Elasticsearch to preserve the version
from the source, create any documents that are missing, and update any documents that have an older version in the destination than they do in the source.
Setting op_type
to create
causes the reindex API to create only missing documents in the destination.
All existing documents will cause a version conflict.
IMPORTANT: Because data streams are append-only, any reindex request to a destination data stream must have an op_type
of create
.
A reindex can only add new documents to a destination data stream.
It cannot update existing documents in a destination data stream.
By default, version conflicts abort the reindex process.
To continue reindexing if there are conflicts, set the conflicts
request body property to proceed
.
In this case, the response includes a count of the version conflicts that were encountered.
Note that the handling of other error types is unaffected by the conflicts
property.
Additionally, if you opt to count version conflicts, the operation could attempt to reindex more documents from the source than max_docs
until it has successfully indexed max_docs
documents into the target or it has gone through every document in the source query.
NOTE: The reindex API makes no effort to handle ID collisions. The last document written will "win" but the order isn't usually predictable so it is not a good idea to rely on this behavior. Instead, make sure that IDs are unique by using a script.
Running reindex asynchronously
If the request contains wait_for_completion=false
, Elasticsearch performs some preflight checks, launches the request, and returns a task you can use to cancel or get the status of the task.
Elasticsearch creates a record of this task as a document at _tasks/<task_id>
.
Reindex from multiple sources
If you have many sources to reindex it is generally better to reindex them one at a time rather than using a glob pattern to pick up multiple sources. That way you can resume the process if there are any errors by removing the partially completed source and starting over. It also makes parallelizing the process fairly simple: split the list of sources to reindex and run each list in parallel.
For example, you can use a bash script like this:
for index in i1 i2 i3 i4 i5; do
curl -HContent-Type:application/json -XPOST localhost:9200/_reindex?pretty -d'{
"source": {
"index": "'$index'"
},
"dest": {
"index": "'$index'-reindexed"
}
}'
done
Throttling
Set requests_per_second
to any positive decimal number (1.4
, 6
, 1000
, for example) to throttle the rate at which reindex issues batches of index operations.
Requests are throttled by padding each batch with a wait time.
To turn off throttling, set requests_per_second
to -1
.
The throttling is done by waiting between batches so that the scroll that reindex uses internally can be given a timeout that takes into account the padding.
The padding time is the difference between the batch size divided by the requests_per_second
and the time spent writing.
By default the batch size is 1000
, so if requests_per_second
is set to 500
:
target_time = 1000 / 500 per second = 2 seconds
wait_time = target_time - write_time = 2 seconds - .5 seconds = 1.5 seconds
Since the batch is issued as a single bulk request, large batch sizes cause Elasticsearch to create many requests and then wait for a while before starting the next set. This is "bursty" instead of "smooth".
Slicing
Reindex supports sliced scroll to parallelize the reindexing process. This parallelization can improve efficiency and provide a convenient way to break the request down into smaller parts.
NOTE: Reindexing from remote clusters does not support manual or automatic slicing.
You can slice a reindex request manually by providing a slice ID and total number of slices to each request.
You can also let reindex automatically parallelize by using sliced scroll to slice on _id
.
The slices
parameter specifies the number of slices to use.
Adding slices
to the reindex request just automates the manual process, creating sub-requests which means it has some quirks:
- You can see these requests in the tasks API. These sub-requests are "child" tasks of the task for the request with slices.
- Fetching the status of the task for the request with
slices
only contains the status of completed slices. - These sub-requests are individually addressable for things like cancellation and rethrottling.
- Rethrottling the request with
slices
will rethrottle the unfinished sub-request proportionally. - Canceling the request with
slices
will cancel each sub-request. - Due to the nature of
slices
, each sub-request won't get a perfectly even portion of the documents. All documents will be addressed, but some slices may be larger than others. Expect larger slices to have a more even distribution. - Parameters like
requests_per_second
andmax_docs
on a request withslices
are distributed proportionally to each sub-request. Combine that with the previous point about distribution being uneven and you should conclude that usingmax_docs
withslices
might not result in exactlymax_docs
documents being reindexed. - Each sub-request gets a slightly different snapshot of the source, though these are all taken at approximately the same time.
If slicing automatically, setting slices
to auto
will choose a reasonable number for most indices.
If slicing manually or otherwise tuning automatic slicing, use the following guidelines.
Query performance is most efficient when the number of slices is equal to the number of shards in the index.
If that number is large (for example, 500
), choose a lower number as too many slices will hurt performance.
Setting slices higher than the number of shards generally does not improve efficiency and adds overhead.
Indexing performance scales linearly across available resources with the number of slices.
Whether query or indexing performance dominates the runtime depends on the documents being reindexed and cluster resources.
Modify documents during reindexing
Like _update_by_query
, reindex operations support a script that modifies the document.
Unlike _update_by_query
, the script is allowed to modify the document's metadata.
Just as in _update_by_query
, you can set ctx.op
to change the operation that is run on the destination.
For example, set ctx.op
to noop
if your script decides that the document doesn’t have to be indexed in the destination. This "no operation" will be reported in the noop
counter in the response body.
Set ctx.op
to delete
if your script decides that the document must be deleted from the destination.
The deletion will be reported in the deleted
counter in the response body.
Setting ctx.op
to anything else will return an error, as will setting any other field in ctx
.
Think of the possibilities! Just be careful; you are able to change:
_id
_index
_version
_routing
Setting _version
to null
or clearing it from the ctx
map is just like not sending the version in an indexing request.
It will cause the document to be overwritten in the destination regardless of the version on the target or the version type you use in the reindex API.
Reindex from remote
Reindex supports reindexing from a remote Elasticsearch cluster.
The host
parameter must contain a scheme, host, port, and optional path.
The username
and password
parameters are optional and when they are present the reindex operation will connect to the remote Elasticsearch node using basic authentication.
Be sure to use HTTPS when using basic authentication or the password will be sent in plain text.
There are a range of settings available to configure the behavior of the HTTPS connection.
When using Elastic Cloud, it is also possible to authenticate against the remote cluster through the use of a valid API key.
Remote hosts must be explicitly allowed with the reindex.remote.whitelist
setting.
It can be set to a comma delimited list of allowed remote host and port combinations.
Scheme is ignored; only the host and port are used.
For example:
reindex.remote.whitelist: [otherhost:9200, another:9200, 127.0.10.*:9200, localhost:*"]
The list of allowed hosts must be configured on any nodes that will coordinate the reindex. This feature should work with remote clusters of any version of Elasticsearch. This should enable you to upgrade from any version of Elasticsearch to the current version by reindexing from a cluster of the old version.
WARNING: Elasticsearch does not support forward compatibility across major versions. For example, you cannot reindex from a 7.x cluster into a 6.x cluster.
To enable queries sent to older versions of Elasticsearch, the query
parameter is sent directly to the remote host without validation or modification.
NOTE: Reindexing from remote clusters does not support manual or automatic slicing.
Reindexing from a remote server uses an on-heap buffer that defaults to a maximum size of 100mb.
If the remote index includes very large documents you'll need to use a smaller batch size.
It is also possible to set the socket read timeout on the remote connection with the socket_timeout
field and the connection timeout with the connect_timeout
field.
Both default to 30 seconds.
Configuring SSL parameters
Reindex from remote supports configurable SSL settings.
These must be specified in the elasticsearch.yml
file, with the exception of the secure settings, which you add in the Elasticsearch keystore.
It is not possible to configure SSL in the body of the reindex request.
Query parameters
-
refresh
boolean If
true
, the request refreshes affected shards to make this operation visible to search. -
requests_per_second
number The throttle for this request in sub-requests per second. By default, there is no throttle.
-
scroll
string The period of time that a consistent view of the index should be maintained for scrolled search.
-
slices
number | string The number of slices this task should be divided into. It defaults to one slice, which means the task isn't sliced into subtasks.
Reindex supports sliced scroll to parallelize the reindexing process. This parallelization can improve efficiency and provide a convenient way to break the request down into smaller parts.
NOTE: Reindexing from remote clusters does not support manual or automatic slicing.
If set to
auto
, Elasticsearch chooses the number of slices to use. This setting will use one slice per shard, up to a certain limit. If there are multiple sources, it will choose the number of slices based on the index or backing index with the smallest number of shards. -
timeout
string The period each indexing waits for automatic index creation, dynamic mapping updates, and waiting for active shards. By default, Elasticsearch waits for at least one minute before failing. The actual wait time could be longer, particularly when multiple waits occur.
-
wait_for_active_shards
number | string The number of shard copies that must be active before proceeding with the operation. Set it to
all
or any positive integer up to the total number of shards in the index (number_of_replicas+1
). The default value is one, which means it waits for each primary shard to be active. -
wait_for_completion
boolean If
true
, the request blocks until the operation is complete. -
require_alias
boolean If
true
, the destination must be an index alias.
Body
Required
-
conflicts
string Values are
abort
orproceed
. -
dest
object Required -
max_docs
number The maximum number of documents to reindex. By default, all documents are reindexed. If it is a value less then or equal to
scroll_size
, a scroll will not be used to retrieve the results for the operation.If
conflicts
is set toproceed
, the reindex operation could attempt to reindex more documents from the source thanmax_docs
until it has successfully indexedmax_docs
documents into the target or it has gone through every document in the source query. -
script
object -
size
number -
source
object Required
curl \
--request POST 'http://api.example.com/_reindex' \
--header "Authorization: $API_KEY" \
--header "Content-Type: application/json" \
--data '"{\n \"source\": {\n \"index\": [\"my-index-000001\", \"my-index-000002\"]\n },\n \"dest\": {\n \"index\": \"my-new-index-000002\"\n }\n}"'
{
"source": {
"index": ["my-index-000001", "my-index-000002"]
},
"dest": {
"index": "my-new-index-000002"
}
}
{
"source": {
"index": "metricbeat-*"
},
"dest": {
"index": "metricbeat"
},
"script": {
"lang": "painless",
"source": "ctx._index = 'metricbeat-' + (ctx._index.substring('metricbeat-'.length(), ctx._index.length())) + '-1'"
}
}
{
"max_docs": 10,
"source": {
"index": "my-index-000001",
"query": {
"function_score" : {
"random_score" : {},
"min_score" : 0.9
}
}
},
"dest": {
"index": "my-new-index-000001"
}
}
{
"source": {
"index": "my-index-000001"
},
"dest": {
"index": "my-new-index-000001",
"version_type": "external"
},
"script": {
"source": "if (ctx._source.foo == 'bar') {ctx._version++; ctx._source.remove('foo')}",
"lang": "painless"
}
}
{
"source": {
"remote": {
"host": "http://otherhost:9200",
"username": "user",
"password": "pass"
},
"index": "my-index-000001",
"query": {
"match": {
"test": "data"
}
}
},
"dest": {
"index": "my-new-index-000001"
}
}
{
"source": {
"index": "my-index-000001",
"slice": {
"id": 0,
"max": 2
}
},
"dest": {
"index": "my-new-index-000001"
}
}
{
"source": {
"index": "my-index-000001"
},
"dest": {
"index": "my-new-index-000001"
}
}
{
"source": {
"index": "source",
"query": {
"match": {
"company": "cat"
}
}
},
"dest": {
"index": "dest",
"routing": "=cat"
}
}
{
"source": {
"index": "source"
},
"dest": {
"index": "dest",
"pipeline": "some_ingest_pipeline"
}
}
{
"source": {
"index": "my-index-000001",
"query": {
"term": {
"user.id": "kimchy"
}
}
},
"dest": {
"index": "my-new-index-000001"
}
}
{
"max_docs": 1,
"source": {
"index": "my-index-000001"
},
"dest": {
"index": "my-new-index-000001"
}
}
{
"source": {
"index": "my-index-000001",
"_source": ["user.id", "_doc"]
},
"dest": {
"index": "my-new-index-000001"
}
}
{
"source": {
"index": "my-index-000001"
},
"dest": {
"index": "my-new-index-000001"
},
"script": {
"source": "ctx._source.tag = ctx._source.remove(\"flag\")"
}
}
{
"batches": 42.0,
"created": 42.0,
"deleted": 42.0,
"failures": [
{
"cause": {
"type": "string",
"reason": "string",
"stack_trace": "string",
"caused_by": {},
"root_cause": [
{}
],
"suppressed": [
{}
]
},
"id": "string",
"index": "string",
"status": 42.0
}
],
"noops": 42.0,
"retries": {
"bulk": 42.0,
"search": 42.0
},
"requests_per_second": 42.0,
"slice_id": 42.0,
"": 42.0,
"timed_out": true,
"total": 42.0,
"updated": 42.0,
"version_conflicts": 42.0
}
Update documents
Added in 2.4.0
Updates documents that match the specified query. If no query is specified, performs an update on every document in the data stream or index without modifying the source, which is useful for picking up mapping changes.
If the Elasticsearch security features are enabled, you must have the following index privileges for the target data stream, index, or alias:
read
index
orwrite
You can specify the query criteria in the request URI or the request body using the same syntax as the search API.
When you submit an update by query request, Elasticsearch gets a snapshot of the data stream or index when it begins processing the request and updates matching documents using internal versioning.
When the versions match, the document is updated and the version number is incremented.
If a document changes between the time that the snapshot is taken and the update operation is processed, it results in a version conflict and the operation fails.
You can opt to count version conflicts instead of halting and returning by setting conflicts
to proceed
.
Note that if you opt to count version conflicts, the operation could attempt to update more documents from the source than max_docs
until it has successfully updated max_docs
documents or it has gone through every document in the source query.
NOTE: Documents with a version equal to 0 cannot be updated using update by query because internal versioning does not support 0 as a valid version number.
While processing an update by query request, Elasticsearch performs multiple search requests sequentially to find all of the matching documents. A bulk update request is performed for each batch of matching documents. Any query or update failures cause the update by query request to fail and the failures are shown in the response. Any update requests that completed successfully still stick, they are not rolled back.
Throttling update requests
To control the rate at which update by query issues batches of update operations, you can set requests_per_second
to any positive decimal number.
This pads each batch with a wait time to throttle the rate.
Set requests_per_second
to -1
to turn off throttling.
Throttling uses a wait time between batches so that the internal scroll requests can be given a timeout that takes the request padding into account.
The padding time is the difference between the batch size divided by the requests_per_second
and the time spent writing.
By default the batch size is 1000, so if requests_per_second
is set to 500
:
target_time = 1000 / 500 per second = 2 seconds
wait_time = target_time - write_time = 2 seconds - .5 seconds = 1.5 seconds
Since the batch is issued as a single _bulk request, large batch sizes cause Elasticsearch to create many requests and wait before starting the next set. This is "bursty" instead of "smooth".
Slicing
Update by query supports sliced scroll to parallelize the update process. This can improve efficiency and provide a convenient way to break the request down into smaller parts.
Setting slices
to auto
chooses a reasonable number for most data streams and indices.
This setting will use one slice per shard, up to a certain limit.
If there are multiple source data streams or indices, it will choose the number of slices based on the index or backing index with the smallest number of shards.
Adding slices
to _update_by_query
just automates the manual process of creating sub-requests, which means it has some quirks:
- You can see these requests in the tasks APIs. These sub-requests are "child" tasks of the task for the request with slices.
- Fetching the status of the task for the request with
slices
only contains the status of completed slices. - These sub-requests are individually addressable for things like cancellation and rethrottling.
- Rethrottling the request with
slices
will rethrottle the unfinished sub-request proportionally. - Canceling the request with slices will cancel each sub-request.
- Due to the nature of slices each sub-request won't get a perfectly even portion of the documents. All documents will be addressed, but some slices may be larger than others. Expect larger slices to have a more even distribution.
- Parameters like
requests_per_second
andmax_docs
on a request with slices are distributed proportionally to each sub-request. Combine that with the point above about distribution being uneven and you should conclude that usingmax_docs
withslices
might not result in exactlymax_docs
documents being updated. - Each sub-request gets a slightly different snapshot of the source data stream or index though these are all taken at approximately the same time.
If you're slicing manually or otherwise tuning automatic slicing, keep in mind that:
- Query performance is most efficient when the number of slices is equal to the number of shards in the index or backing index. If that number is large (for example, 500), choose a lower number as too many slices hurts performance. Setting slices higher than the number of shards generally does not improve efficiency and adds overhead.
- Update performance scales linearly across available resources with the number of slices.
Whether query or update performance dominates the runtime depends on the documents being reindexed and cluster resources.
Update the document source
Update by query supports scripts to update the document source.
As with the update API, you can set ctx.op
to change the operation that is performed.
Set ctx.op = "noop"
if your script decides that it doesn't have to make any changes.
The update by query operation skips updating the document and increments the noop
counter.
Set ctx.op = "delete"
if your script decides that the document should be deleted.
The update by query operation deletes the document and increments the deleted
counter.
Update by query supports only index
, noop
, and delete
.
Setting ctx.op
to anything else is an error.
Setting any other field in ctx
is an error.
This API enables you to only modify the source of matching documents; you cannot move them.
Path parameters
-
index
string | array[string] Required A comma-separated list of data streams, indices, and aliases to search. It supports wildcards (
*
). To search all data streams or indices, omit this parameter or use*
or_all
.
Query parameters
-
allow_no_indices
boolean If
false
, the request returns an error if any wildcard expression, index alias, or_all
value targets only missing or closed indices. This behavior applies even if the request targets other open indices. For example, a request targetingfoo*,bar*
returns an error if an index starts withfoo
but no index starts withbar
. -
analyzer
string The analyzer to use for the query string. This parameter can be used only when the
q
query string parameter is specified. -
analyze_wildcard
boolean If
true
, wildcard and prefix queries are analyzed. This parameter can be used only when theq
query string parameter is specified. -
conflicts
string The preferred behavior when update by query hits version conflicts:
abort
orproceed
.Values are
abort
orproceed
. -
default_operator
string The default operator for query string query:
AND
orOR
. This parameter can be used only when theq
query string parameter is specified.Values are
and
,AND
,or
, orOR
. -
df
string The field to use as default where no field prefix is given in the query string. This parameter can be used only when the
q
query string parameter is specified. -
expand_wildcards
string | array[string] The type of index that wildcard patterns can match. If the request can target data streams, this argument determines whether wildcard expressions match hidden data streams. It supports comma-separated values, such as
open,hidden
. Valid values are:all
,open
,closed
,hidden
,none
. -
from
number Skips the specified number of documents.
-
lenient
boolean If
true
, format-based query failures (such as providing text to a numeric field) in the query string will be ignored. This parameter can be used only when theq
query string parameter is specified. -
max_docs
number The maximum number of documents to process. It defaults to all documents. When set to a value less then or equal to
scroll_size
then a scroll will not be used to retrieve the results for the operation. -
pipeline
string The ID of the pipeline to use to preprocess incoming documents. If the index has a default ingest pipeline specified, then setting the value to
_none
disables the default ingest pipeline for this request. If a final pipeline is configured it will always run, regardless of the value of this parameter. -
preference
string The node or shard the operation should be performed on. It is random by default.
-
q
string A query in the Lucene query string syntax.
-
refresh
boolean If
true
, Elasticsearch refreshes affected shards to make the operation visible to search after the request completes. This is different than the update API'srefresh
parameter, which causes just the shard that received the request to be refreshed. -
request_cache
boolean If
true
, the request cache is used for this request. It defaults to the index-level setting. -
requests_per_second
number The throttle for this request in sub-requests per second.
-
routing
string A custom value used to route operations to a specific shard.
-
scroll
string The period to retain the search context for scrolling.
-
scroll_size
number The size of the scroll request that powers the operation.
-
search_timeout
string An explicit timeout for each search request. By default, there is no timeout.
-
search_type
string The type of the search operation. Available options include
query_then_fetch
anddfs_query_then_fetch
.Values are
query_then_fetch
ordfs_query_then_fetch
. -
slices
number | string The number of slices this task should be divided into.
-
sort
array[string] A comma-separated list of : pairs.
-
stats
array[string] The specific
tag
of the request for logging and statistical purposes. -
terminate_after
number The maximum number of documents to collect for each shard. If a query reaches this limit, Elasticsearch terminates the query early. Elasticsearch collects documents before sorting.
IMPORTANT: Use with caution. Elasticsearch applies this parameter to each shard handling the request. When possible, let Elasticsearch perform early termination automatically. Avoid specifying this parameter for requests that target data streams with backing indices across multiple data tiers.
-
timeout
string The period each update request waits for the following operations: dynamic mapping updates, waiting for active shards. By default, it is one minute. This guarantees Elasticsearch waits for at least the timeout before failing. The actual wait time could be longer, particularly when multiple waits occur.
-
version
boolean If
true
, returns the document version as part of a hit. -
version_type
boolean Should the document increment the version number (internal) on hit or not (reindex)
-
wait_for_active_shards
number | string The number of shard copies that must be active before proceeding with the operation. Set to
all
or any positive integer up to the total number of shards in the index (number_of_replicas+1
). Thetimeout
parameter controls how long each write request waits for unavailable shards to become available. Both work exactly the way they work in the bulk API. -
wait_for_completion
boolean If
true
, the request blocks until the operation is complete. Iffalse
, Elasticsearch performs some preflight checks, launches the request, and returns a task ID that you can use to cancel or get the status of the task. Elasticsearch creates a record of this task as a document at.tasks/task/${taskId}
.
curl \
--request POST 'http://api.example.com/{index}/_update_by_query' \
--header "Authorization: $API_KEY" \
--header "Content-Type: application/json" \
--data '"{\n \"query\": { \n \"term\": {\n \"user.id\": \"kimchy\"\n }\n }\n}"'
{
"query": {
"term": {
"user.id": "kimchy"
}
}
}
{
"script": {
"source": "ctx._source.count++",
"lang": "painless"
},
"query": {
"term": {
"user.id": "kimchy"
}
}
}
{
"slice": {
"id": 0,
"max": 2
},
"script": {
"source": "ctx._source['extra'] = 'test'"
}
}
{
"script": {
"source": "ctx._source['extra'] = 'test'"
}
}
{
"batches": 42.0,
"failures": [
{
"cause": {
"type": "string",
"reason": "string",
"stack_trace": "string",
"caused_by": {},
"root_cause": [
{}
],
"suppressed": [
{}
]
},
"id": "string",
"index": "string",
"status": 42.0
}
],
"noops": 42.0,
"deleted": 42.0,
"requests_per_second": 42.0,
"retries": {
"bulk": 42.0,
"search": 42.0
},
"": 42.0,
"timed_out": true,
"total": 42.0,
"updated": 42.0,
"version_conflicts": 42.0,
"throttled": "string",
"throttled_until": "string"
}
Import a dangling index
Added in 7.9.0
If Elasticsearch encounters index data that is absent from the current cluster state, those indices are considered to be dangling.
For example, this can happen if you delete more than cluster.indices.tombstones.size
indices while an Elasticsearch node is offline.
Path parameters
-
index_uuid
string Required The UUID of the index to import. Use the get dangling indices API to locate the UUID.
Query parameters
-
accept_data_loss
boolean Required This parameter must be set to true to import a dangling index. Because Elasticsearch cannot know where the dangling index data came from or determine which shard copies are fresh and which are stale, it cannot guarantee that the imported data represents the latest state of the index when it was last in the cluster.
-
master_timeout
string Specify timeout for connection to master
-
timeout
string Explicit operation timeout
curl \
--request POST 'http://api.example.com/_dangling/{index_uuid}?accept_data_loss=true' \
--header "Authorization: $API_KEY"
{
"acknowledged": true
}
Get aliases
Retrieves information for one or more data stream or index aliases.
Path parameters
-
index
string | array[string] Required Comma-separated list of data streams or indices used to limit the request. Supports wildcards (
*
). To target all data streams and indices, omit this parameter or use*
or_all
. -
name
string | array[string] Required Comma-separated list of aliases to retrieve. Supports wildcards (
*
). To retrieve all aliases, omit this parameter or use*
or_all
.
Query parameters
-
allow_no_indices
boolean If
false
, the request returns an error if any wildcard expression, index alias, or_all
value targets only missing or closed indices. This behavior applies even if the request targets other open indices. -
expand_wildcards
string | array[string] Type of index that wildcard patterns can match. If the request can target data streams, this argument determines whether wildcard expressions match hidden data streams. Supports comma-separated values, such as
open,hidden
. Valid values are:all
,open
,closed
,hidden
,none
. -
master_timeout
string Period to wait for a connection to the master node. If no response is received before the timeout expires, the request fails and returns an error.
curl \
--request GET 'http://api.example.com/{index}/_alias/{name}' \
--header "Authorization: $API_KEY"
{
"additionalProperty1": {
"aliases": {
"additionalProperty1": {
"filter": {},
"index_routing": "string",
"is_write_index": true,
"routing": "string",
"search_routing": "string",
"is_hidden": true
},
"additionalProperty2": {
"filter": {},
"index_routing": "string",
"is_write_index": true,
"routing": "string",
"search_routing": "string",
"is_hidden": true
}
}
},
"additionalProperty2": {
"aliases": {
"additionalProperty1": {
"filter": {},
"index_routing": "string",
"is_write_index": true,
"routing": "string",
"search_routing": "string",
"is_hidden": true
},
"additionalProperty2": {
"filter": {},
"index_routing": "string",
"is_write_index": true,
"routing": "string",
"search_routing": "string",
"is_hidden": true
}
}
}
}
Create or update an index template
Index templates define settings, mappings, and aliases that can be applied automatically to new indices. Elasticsearch applies templates to new indices based on an index pattern that matches the index name.
IMPORTANT: This documentation is about legacy index templates, which are deprecated and will be replaced by the composable templates introduced in Elasticsearch 7.8.
Composable templates always take precedence over legacy templates. If no composable template matches a new index, matching legacy templates are applied according to their order.
Index templates are only applied during index creation. Changes to index templates do not affect existing indices. Settings and mappings specified in create index API requests override any settings or mappings specified in an index template.
You can use C-style /* *\/
block comments in index templates.
You can include comments anywhere in the request body, except before the opening curly bracket.
Indices matching multiple templates
Multiple index templates can potentially match an index, in this case, both the settings and mappings are merged into the final configuration of the index. The order of the merging can be controlled using the order parameter, with lower order being applied first, and higher orders overriding them. NOTE: Multiple matching templates with the same order value will result in a non-deterministic merging order.
Path parameters
-
name
string Required The name of the template
Query parameters
-
create
boolean If true, this request cannot replace or update existing index templates.
-
master_timeout
string Period to wait for a connection to the master node. If no response is received before the timeout expires, the request fails and returns an error.
-
order
number Order in which Elasticsearch applies this template if index matches multiple templates.
Templates with lower 'order' values are merged first. Templates with higher 'order' values are merged later, overriding templates with lower values.
-
cause
string User defined reason for creating/updating the index template
Body
Required
-
aliases
object Aliases for the index.
index_patterns
string | array[string] Array of wildcard expressions used to match the names of indices during creation.
-
mappings
object -
order
number Order in which Elasticsearch applies this template if index matches multiple templates.
Templates with lower 'order' values are merged first. Templates with higher 'order' values are merged later, overriding templates with lower values.
-
settings
object -
version
number
curl \
--request POST 'http://api.example.com/_template/{name}' \
--header "Authorization: $API_KEY" \
--header "Content-Type: application/json" \
--data '"{\n \"index_patterns\": [\n \"te*\",\n \"bar*\"\n ],\n \"settings\": {\n \"number_of_shards\": 1\n },\n \"mappings\": {\n \"_source\": {\n \"enabled\": false\n }\n },\n \"properties\": {\n \"host_name\": {\n \"type\": \"keyword\"\n },\n \"created_at\": {\n \"type\": \"date\",\n \"format\": \"EEE MMM dd HH:mm:ss Z yyyy\"\n }\n }\n}"'
{
"index_patterns": [
"te*",
"bar*"
],
"settings": {
"number_of_shards": 1
},
"mappings": {
"_source": {
"enabled": false
}
},
"properties": {
"host_name": {
"type": "keyword"
},
"created_at": {
"type": "date",
"format": "EEE MMM dd HH:mm:ss Z yyyy"
}
}
}
{
"index_patterns": [
"te*"
],
"settings": {
"number_of_shards": 1
},
"aliases": {
"alias1": {},
"alias2": {
"filter": {
"term": {
"user.id": "kimchy"
}
},
"routing": "shard-1"
},
"{index}-alias": {}
}
}
{
"acknowledged": true
}
Create an Azure AI studio inference endpoint
Added in 8.14.0
Create an inference endpoint to perform an inference task with the azureaistudio
service.
When you create an inference endpoint, the associated machine learning model is automatically deployed if it is not already running.
After creating the endpoint, wait for the model deployment to complete before using it.
To verify the deployment status, use the get trained model statistics API.
Look for "state": "fully_allocated"
in the response and ensure that the "allocation_count"
matches the "target_allocation_count"
.
Avoid creating multiple endpoints for the same model unless required, as each endpoint consumes significant resources.
Path parameters
-
task_type
string Required The type of the inference task that the model will perform.
Values are
completion
ortext_embedding
. -
azureaistudio_inference_id
string Required The unique identifier of the inference endpoint.
Body
-
chunking_settings
object -
service
string Required Value is
azureaistudio
. -
service_settings
object Required -
task_settings
object
curl \
--request PUT 'http://api.example.com/_inference/{task_type}/{azureaistudio_inference_id}' \
--header "Authorization: $API_KEY" \
--header "Content-Type: application/json" \
--data '"{\n \"service\": \"azureaistudio\",\n \"service_settings\": {\n \"api_key\": \"Azure-AI-Studio-API-key\",\n \"target\": \"Target-Uri\",\n \"provider\": \"openai\",\n \"endpoint_type\": \"token\"\n }\n}"'
{
"service": "azureaistudio",
"service_settings": {
"api_key": "Azure-AI-Studio-API-key",
"target": "Target-Uri",
"provider": "openai",
"endpoint_type": "token"
}
}
{
"service": "azureaistudio",
"service_settings": {
"api_key": "Azure-AI-Studio-API-key",
"target": "Target-URI",
"provider": "databricks",
"endpoint_type": "realtime"
}
}
{
"chunking_settings": {
"max_chunk_size": 42.0,
"overlap": 42.0,
"sentence_overlap": 42.0,
"strategy": "string"
},
"service": "string",
"service_settings": {},
"task_settings": {},
"inference_id": "string",
"task_type": "sparse_embedding"
}
Create an ELSER inference endpoint
Deprecated
Added in 8.11.0
Create an inference endpoint to perform an inference task with the elser
service.
You can also deploy ELSER by using the Elasticsearch inference integration.
Your Elasticsearch deployment contains a preconfigured ELSER inference endpoint, you only need to create the enpoint using the API if you want to customize the settings.
The API request will automatically download and deploy the ELSER model if it isn't already downloaded.
You might see a 502 bad gateway error in the response when using the Kibana Console. This error usually just reflects a timeout, while the model downloads in the background. You can check the download progress in the Machine Learning UI. If using the Python client, you can set the timeout parameter to a higher value.
After creating the endpoint, wait for the model deployment to complete before using it.
To verify the deployment status, use the get trained model statistics API.
Look for "state": "fully_allocated"
in the response and ensure that the "allocation_count"
matches the "target_allocation_count"
.
Avoid creating multiple endpoints for the same model unless required, as each endpoint consumes significant resources.
Path parameters
-
task_type
string Required The type of the inference task that the model will perform.
Value is
sparse_embedding
. -
elser_inference_id
string Required The unique identifier of the inference endpoint.
Body
-
chunking_settings
object -
service
string Required Value is
elser
. -
service_settings
object Required
curl \
--request PUT 'http://api.example.com/_inference/{task_type}/{elser_inference_id}' \
--header "Authorization: $API_KEY" \
--header "Content-Type: application/json" \
--data '"{\n \"service\": \"elser\",\n \"service_settings\": {\n \"num_allocations\": 1,\n \"num_threads\": 1\n }\n}"'
{
"service": "elser",
"service_settings": {
"num_allocations": 1,
"num_threads": 1
}
}
{
"service": "elser",
"service_settings": {
"adaptive_allocations": {
"enabled": true,
"min_number_of_allocations": 3,
"max_number_of_allocations": 10
},
"num_threads": 1
}
}
{
"inference_id": "my-elser-model",
"task_type": "sparse_embedding",
"service": "elser",
"service_settings": {
"num_allocations": 1,
"num_threads": 1
},
"task_settings": {}
}
Create or update an IP geolocation database configuration
Added in 8.15.0
Path parameters
-
id
string Required The database configuration identifier.
Query parameters
-
master_timeout
string The period to wait for a connection to the master node. If no response is received before the timeout expires, the request fails and returns an error. A value of
-1
indicates that the request should never time out. -
timeout
string The period to wait for a response from all relevant nodes in the cluster after updating the cluster metadata. If no response is received before the timeout expires, the cluster metadata update still applies but the response indicates that it was not completely acknowledged. A value of
-1
indicates that the request should never time out.
Body
Required
The configuration necessary to identify which IP geolocation provider to use to download a database, as well as any provider-specific configuration necessary for such downloading.
At present, the only supported providers are maxmind
and ipinfo
, and the maxmind
provider requires that an account_id
(string) is configured.
A provider (either maxmind
or ipinfo
) must be specified. The web and local providers can be returned as read only configurations.
curl \
--request PUT 'http://api.example.com/_ingest/ip_location/database/{id}' \
--header "Authorization: $API_KEY" \
--header "Content-Type: application/json" \
--data '{"name":"string","maxmind":{"account_id":"string"},"ipinfo":{}}'
{
"name": "string",
"maxmind": {
"account_id": "string"
},
"ipinfo": {}
}
{
"acknowledged": true
}
Get GeoIP database configurations
Added in 8.15.0
Get information about one or more IP geolocation database configurations.
curl \
--request GET 'http://api.example.com/_ingest/geoip/database' \
--header "Authorization: $API_KEY"
{
"databases": [
{
"id": "string",
"version": 42.0,
"": 42.0,
"database": {
"name": "string",
"maxmind": {
"account_id": "string"
},
"ipinfo": {}
}
}
]
}
Set upgrade_mode for ML indices
Added in 6.7.0
Sets a cluster wide upgrade_mode setting that prepares machine learning indices for an upgrade. When upgrading your cluster, in some circumstances you must restart your nodes and reindex your machine learning indices. In those circumstances, there must be no machine learning jobs running. You can close the machine learning jobs, do the upgrade, then open all the jobs again. Alternatively, you can use this API to temporarily halt tasks associated with the jobs and datafeeds and prevent new jobs from opening. You can also use this API during upgrades that do not require you to reindex your machine learning indices, though stopping jobs is not a requirement in that case. You can see the current value for the upgrade_mode setting by using the get machine learning info API.
curl \
--request POST 'http://api.example.com/_ml/set_upgrade_mode' \
--header "Authorization: $API_KEY"
{
"acknowledged": true
}
Delete anomaly jobs from a calendar
Added in 6.2.0
Path parameters
-
calendar_id
string Required A string that uniquely identifies a calendar.
-
job_id
string | array[string] Required An identifier for the anomaly detection jobs. It can be a job identifier, a group name, or a comma-separated list of jobs or groups.
curl \
--request DELETE 'http://api.example.com/_ml/calendars/{calendar_id}/jobs/{job_id}' \
--header "Authorization: $API_KEY"
{
"calendar_id": "planned-outages",
"job_ids": []
}
Delete expired ML data
Added in 5.4.0
Delete all job results, model snapshots and forecast data that have exceeded
their retention days period. Machine learning state documents that are not
associated with any job are also deleted.
You can limit the request to a single or set of anomaly detection jobs by
using a job identifier, a group name, a comma-separated list of jobs, or a
wildcard expression. You can delete expired data for all anomaly detection
jobs by using _all
, by specifying *
as the <job_id>
, or by omitting the
<job_id>
.
Path parameters
-
job_id
string Required Identifier for an anomaly detection job. It can be a job identifier, a group name, or a wildcard expression.
Query parameters
-
requests_per_second
number The desired requests per second for the deletion processes. The default behavior is no throttling.
-
timeout
string How long can the underlying delete processes run until they are canceled.
Body
-
requests_per_second
number The desired requests per second for the deletion processes. The default behavior is no throttling.
-
timeout
string A duration. Units can be
nanos
,micros
,ms
(milliseconds),s
(seconds),m
(minutes),h
(hours) andd
(days). Also accepts "0" without a unit and "-1" to indicate an unspecified value.
curl \
--request DELETE 'http://api.example.com/_ml/_delete_expired_data/{job_id}' \
--header "Authorization: $API_KEY" \
--header "Content-Type: application/json" \
--data '{"requests_per_second":42.0,"timeout":"string"}'
{
"requests_per_second": 42.0,
"timeout": "string"
}
{
"deleted": true
}
Reset an anomaly detection job
Added in 7.14.0
All model state and results are deleted. The job is ready to start over as if it had just been created. It is not currently possible to reset multiple jobs using wildcards or a comma separated list.
Path parameters
-
job_id
string Required The ID of the job to reset.
Query parameters
-
wait_for_completion
boolean Should this request wait until the operation has completed before returning.
-
delete_user_annotations
boolean Specifies whether annotations that have been added by the user should be deleted along with any auto-generated annotations when the job is reset.
curl \
--request POST 'http://api.example.com/_ml/anomaly_detectors/{job_id}/_reset' \
--header "Authorization: $API_KEY"
{
"acknowledged": true
}
Update a datafeed
Added in 6.4.0
You must stop and start the datafeed for the changes to be applied. When Elasticsearch security features are enabled, your datafeed remembers which roles the user who updated it had at the time of the update and runs the query using those same roles. If you provide secondary authorization headers, those credentials are used instead.
Path parameters
-
datafeed_id
string Required A numerical character string that uniquely identifies the datafeed. This identifier can contain lowercase alphanumeric characters (a-z and 0-9), hyphens, and underscores. It must start and end with alphanumeric characters.
Query parameters
-
allow_no_indices
boolean If
true
, wildcard indices expressions that resolve into no concrete indices are ignored. This includes the_all
string or when no indices are specified. -
expand_wildcards
string | array[string] Type of index that wildcard patterns can match. If the request can target data streams, this argument determines whether wildcard expressions match hidden data streams. Supports comma-separated values. Valid values are:
all
: Match any data stream or index, including hidden ones.closed
: Match closed, non-hidden indices. Also matches any non-hidden data stream. Data streams cannot be closed.hidden
: Match hidden data streams and hidden indices. Must be combined withopen
,closed
, or both.none
: Wildcard patterns are not accepted.open
: Match open, non-hidden indices. Also matches any non-hidden data stream.
-
ignore_throttled
boolean Deprecated If
true
, concrete, expanded or aliased indices are ignored when frozen.
Body
Required
-
aggregations
object If set, the datafeed performs aggregation searches. Support for aggregations is limited and should be used only with low cardinality data.
-
chunking_config
object -
delayed_data_check_config
object -
frequency
string A duration. Units can be
nanos
,micros
,ms
(milliseconds),s
(seconds),m
(minutes),h
(hours) andd
(days). Also accepts "0" without a unit and "-1" to indicate an unspecified value. -
indices
array[string] An array of index names. Wildcards are supported. If any of the indices are in remote clusters, the machine learning nodes must have the
remote_cluster_client
role. -
indices_options
object -
job_id
string -
max_empty_searches
number If a real-time datafeed has never seen any data (including during any initial training period), it automatically stops and closes the associated job after this many real-time searches return no documents. In other words, it stops after
frequency
timesmax_empty_searches
of real-time operation. If not set, a datafeed with no end time that sees no data remains started until it is explicitly stopped. By default, it is not set. -
query
object An Elasticsearch Query DSL (Domain Specific Language) object that defines a query.
External documentation -
query_delay
string A duration. Units can be
nanos
,micros
,ms
(milliseconds),s
(seconds),m
(minutes),h
(hours) andd
(days). Also accepts "0" without a unit and "-1" to indicate an unspecified value. -
runtime_mappings
object -
script_fields
object Specifies scripts that evaluate custom expressions and returns script fields to the datafeed. The detector configuration objects in a job can contain functions that use these script fields.
-
scroll_size
number The size parameter that is used in Elasticsearch searches when the datafeed does not use aggregations. The maximum value is the value of
index.max_result_window
.
curl \
--request POST 'http://api.example.com/_ml/datafeeds/{datafeed_id}/_update' \
--header "Authorization: $API_KEY" \
--header "Content-Type: application/json" \
--data '{"aggregations":{},"chunking_config":{"mode":"auto","time_span":"string"},"delayed_data_check_config":{"check_window":"string","enabled":true},"frequency":"string","indices":["string"],"indices_options":{"allow_no_indices":true,"expand_wildcards":"string","ignore_unavailable":true,"ignore_throttled":true},"job_id":"string","max_empty_searches":42.0,"query":{},"query_delay":"string","runtime_mappings":{"additionalProperty1":{"fields":{"additionalProperty1":{"type":"boolean"},"additionalProperty2":{"type":"boolean"}},"fetch_fields":[{"field":"string","format":"string"}],"format":"string","input_field":"string","target_field":"string","target_index":"string","script":{"":"painless","id":"string","params":{"additionalProperty1":{},"additionalProperty2":{}},"options":{"additionalProperty1":"string","additionalProperty2":"string"}},"type":"boolean"},"additionalProperty2":{"fields":{"additionalProperty1":{"type":"boolean"},"additionalProperty2":{"type":"boolean"}},"fetch_fields":[{"field":"string","format":"string"}],"format":"string","input_field":"string","target_field":"string","target_index":"string","script":{"":"painless","id":"string","params":{"additionalProperty1":{},"additionalProperty2":{}},"options":{"additionalProperty1":"string","additionalProperty2":"string"}},"type":"boolean"}},"script_fields":{"additionalProperty1":{"script":{"":"painless","id":"string","params":{"additionalProperty1":{},"additionalProperty2":{}},"options":{"additionalProperty1":"string","additionalProperty2":"string"}},"ignore_failure":true},"additionalProperty2":{"script":{"":"painless","id":"string","params":{"additionalProperty1":{},"additionalProperty2":{}},"options":{"additionalProperty1":"string","additionalProperty2":"string"}},"ignore_failure":true}},"scroll_size":42.0}'
{
"aggregations": {},
"chunking_config": {
"mode": "auto",
"time_span": "string"
},
"delayed_data_check_config": {
"check_window": "string",
"enabled": true
},
"frequency": "string",
"indices": [
"string"
],
"indices_options": {
"allow_no_indices": true,
"expand_wildcards": "string",
"ignore_unavailable": true,
"ignore_throttled": true
},
"job_id": "string",
"max_empty_searches": 42.0,
"query": {},
"query_delay": "string",
"runtime_mappings": {
"additionalProperty1": {
"fields": {
"additionalProperty1": {
"type": "boolean"
},
"additionalProperty2": {
"type": "boolean"
}
},
"fetch_fields": [
{
"field": "string",
"format": "string"
}
],
"format": "string",
"input_field": "string",
"target_field": "string",
"target_index": "string",
"script": {
"": "painless",
"id": "string",
"params": {
"additionalProperty1": {},
"additionalProperty2": {}
},
"options": {
"additionalProperty1": "string",
"additionalProperty2": "string"
}
},
"type": "boolean"
},
"additionalProperty2": {
"fields": {
"additionalProperty1": {
"type": "boolean"
},
"additionalProperty2": {
"type": "boolean"
}
},
"fetch_fields": [
{
"field": "string",
"format": "string"
}
],
"format": "string",
"input_field": "string",
"target_field": "string",
"target_index": "string",
"script": {
"": "painless",
"id": "string",
"params": {
"additionalProperty1": {},
"additionalProperty2": {}
},
"options": {
"additionalProperty1": "string",
"additionalProperty2": "string"
}
},
"type": "boolean"
}
},
"script_fields": {
"additionalProperty1": {
"script": {
"": "painless",
"id": "string",
"params": {
"additionalProperty1": {},
"additionalProperty2": {}
},
"options": {
"additionalProperty1": "string",
"additionalProperty2": "string"
}
},
"ignore_failure": true
},
"additionalProperty2": {
"script": {
"": "painless",
"id": "string",
"params": {
"additionalProperty1": {},
"additionalProperty2": {}
},
"options": {
"additionalProperty1": "string",
"additionalProperty2": "string"
}
},
"ignore_failure": true
}
},
"scroll_size": 42.0
}
{
"authorization": {
"api_key": {
"id": "string",
"name": "string"
},
"roles": [
"string"
],
"service_account": "string"
},
"aggregations": {},
"chunking_config": {
"mode": "auto",
"time_span": "string"
},
"delayed_data_check_config": {
"check_window": "string",
"enabled": true
},
"datafeed_id": "string",
"frequency": "string",
"indices": [
"string"
],
"indices_options": {
"allow_no_indices": true,
"expand_wildcards": "string",
"ignore_unavailable": true,
"ignore_throttled": true
},
"job_id": "string",
"max_empty_searches": 42.0,
"query": {},
"query_delay": "string",
"runtime_mappings": {
"additionalProperty1": {
"fields": {
"additionalProperty1": {
"type": "boolean"
},
"additionalProperty2": {
"type": "boolean"
}
},
"fetch_fields": [
{
"field": "string",
"format": "string"
}
],
"format": "string",
"input_field": "string",
"target_field": "string",
"target_index": "string",
"script": {
"": "painless",
"id": "string",
"params": {
"additionalProperty1": {},
"additionalProperty2": {}
},
"options": {
"additionalProperty1": "string",
"additionalProperty2": "string"
}
},
"type": "boolean"
},
"additionalProperty2": {
"fields": {
"additionalProperty1": {
"type": "boolean"
},
"additionalProperty2": {
"type": "boolean"
}
},
"fetch_fields": [
{
"field": "string",
"format": "string"
}
],
"format": "string",
"input_field": "string",
"target_field": "string",
"target_index": "string",
"script": {
"": "painless",
"id": "string",
"params": {
"additionalProperty1": {},
"additionalProperty2": {}
},
"options": {
"additionalProperty1": "string",
"additionalProperty2": "string"
}
},
"type": "boolean"
}
},
"script_fields": {
"additionalProperty1": {
"script": {
"": "painless",
"id": "string",
"params": {
"additionalProperty1": {},
"additionalProperty2": {}
},
"options": {
"additionalProperty1": "string",
"additionalProperty2": "string"
}
},
"ignore_failure": true
},
"additionalProperty2": {
"script": {
"": "painless",
"id": "string",
"params": {
"additionalProperty1": {},
"additionalProperty2": {}
},
"options": {
"additionalProperty1": "string",
"additionalProperty2": "string"
}
},
"ignore_failure": true
}
},
"scroll_size": 42.0
}
Cancel a migration reindex operation
Technical preview
Cancel a migration reindex attempt for a data stream or index.
Path parameters
-
index
string | array[string] Required The index or data stream name
curl \
--request POST 'http://api.example.com/_migration/reindex/{index}/_cancel' \
--header "Authorization: $API_KEY"
{
"acknowledged": true
}
Get the rollup job capabilities
Deprecated
Technical preview
Get the capabilities of any rollup jobs that have been configured for a specific index or index pattern.
This API is useful because a rollup job is often configured to rollup only a subset of fields from the source index. Furthermore, only certain aggregations can be configured for various fields, leading to a limited subset of functionality depending on that configuration. This API enables you to inspect an index and determine:
- Does this index have associated rollup data somewhere in the cluster?
- If yes to the first question, what fields were rolled up, what aggregations can be performed, and where does the data live?
curl \
--request GET 'http://api.example.com/_rollup/data' \
--header "Authorization: $API_KEY"
{
"sensor-*" : {
"rollup_jobs" : [
{
"job_id" : "sensor",
"rollup_index" : "sensor_rollup",
"index_pattern" : "sensor-*",
"fields" : {
"node" : [
{
"agg" : "terms"
}
],
"temperature" : [
{
"agg" : "min"
},
{
"agg" : "max"
},
{
"agg" : "sum"
}
],
"timestamp" : [
{
"agg" : "date_histogram",
"time_zone" : "UTC",
"fixed_interval" : "1h",
"delay": "7d"
}
],
"voltage" : [
{
"agg" : "avg"
}
]
}
}
]
}
}
Get the rollup index capabilities
Deprecated
Technical preview
Get the rollup capabilities of all jobs inside of a rollup index. A single rollup index may store the data for multiple rollup jobs and may have a variety of capabilities depending on those jobs. This API enables you to determine:
- What jobs are stored in an index (or indices specified via a pattern)?
- What target indices were rolled up, what fields were used in those rollups, and what aggregations can be performed on each job?
Path parameters
-
index
string | array[string] Required Data stream or index to check for rollup capabilities. Wildcard (
*
) expressions are supported.
curl \
--request GET 'http://api.example.com/{index}/_rollup/data' \
--header "Authorization: $API_KEY"
{
"sensor_rollup" : {
"rollup_jobs" : [
{
"job_id" : "sensor",
"rollup_index" : "sensor_rollup",
"index_pattern" : "sensor-*",
"fields" : {
"node" : [
{
"agg" : "terms"
}
],
"temperature" : [
{
"agg" : "min"
},
{
"agg" : "max"
},
{
"agg" : "sum"
}
],
"timestamp" : [
{
"agg" : "date_histogram",
"time_zone" : "UTC",
"fixed_interval" : "1h",
"delay": "7d"
}
],
"voltage" : [
{
"agg" : "avg"
}
]
}
}
]
}
}
Search rolled-up data
Deprecated
Technical preview
The rollup search endpoint is needed because, internally, rolled-up documents utilize a different document structure than the original data. It rewrites standard Query DSL into a format that matches the rollup documents then takes the response and rewrites it back to what a client would expect given the original query.
The request body supports a subset of features from the regular search API. The following functionality is not available:
size
: Because rollups work on pre-aggregated data, no search hits can be returned and so size must be set to zero or omitted entirely.
highlighter
, suggestors
, post_filter
, profile
, explain
: These are similarly disallowed.
Searching both historical rollup and non-rollup data
The rollup search API has the capability to search across both "live" non-rollup data and the aggregated rollup data. This is done by simply adding the live indices to the URI. For example:
GET sensor-1,sensor_rollup/_rollup_search
{
"size": 0,
"aggregations": {
"max_temperature": {
"max": {
"field": "temperature"
}
}
}
}
The rollup search endpoint does two things when the search runs:
- The original request is sent to the non-rollup index unaltered.
- A rewritten version of the original request is sent to the rollup index.
When the two responses are received, the endpoint rewrites the rollup response and merges the two together. During the merging process, if there is any overlap in buckets between the two responses, the buckets from the non-rollup index are used.
Path parameters
-
index
string | array[string] Required A comma-separated list of data streams and indices used to limit the request. This parameter has the following rules:
- At least one data stream, index, or wildcard expression must be specified. This target can include a rollup or non-rollup index. For data streams, the stream's backing indices can only serve as non-rollup indices. Omitting the parameter or using
_all
are not permitted. - Multiple non-rollup indices may be specified.
- Only one rollup index may be specified. If more than one are supplied, an exception occurs.
- Wildcard expressions (
*
) may be used. If they match more than one rollup index, an exception occurs. However, you can use an expression to match multiple non-rollup indices or data streams.
- At least one data stream, index, or wildcard expression must be specified. This target can include a rollup or non-rollup index. For data streams, the stream's backing indices can only serve as non-rollup indices. Omitting the parameter or using
Query parameters
-
rest_total_hits_as_int
boolean Indicates whether hits.total should be rendered as an integer or an object in the rest search response
-
typed_keys
boolean Specify whether aggregation and suggester names should be prefixed by their respective types in the response
Body
Required
-
aggregations
object Specifies aggregations.
External documentation -
query
object An Elasticsearch Query DSL (Domain Specific Language) object that defines a query.
External documentation -
size
number Must be zero if set, as rollups work on pre-aggregated data.
curl \
--request GET 'http://api.example.com/{index}/_rollup_search' \
--header "Authorization: $API_KEY" \
--header "Content-Type: application/json" \
--data '"{\n \"size\": 0,\n \"aggregations\": {\n \"max_temperature\": {\n \"max\": {\n \"field\": \"temperature\"\n }\n }\n }\n}"'
{
"size": 0,
"aggregations": {
"max_temperature": {
"max": {
"field": "temperature"
}
}
}
}
{
"took" : 102,
"timed_out" : false,
"terminated_early" : false,
"_shards" : {} ,
"hits" : {
"total" : {
"value": 0,
"relation": "eq"
},
"max_score" : 0.0,
"hits" : [ ]
},
"aggregations" : {
"max_temperature" : {
"value" : 202.0
}
}
}
Evaluate ranked search results
Added in 6.2.0
Evaluate the quality of ranked search results over a set of typical search queries.
Path parameters
-
index
string | array[string] Required A comma-separated list of data streams, indices, and index aliases used to limit the request. Wildcard (
*
) expressions are supported. To target all data streams and indices in a cluster, omit this parameter or use_all
or*
.
Query parameters
-
allow_no_indices
boolean If
false
, the request returns an error if any wildcard expression, index alias, or_all
value targets only missing or closed indices. This behavior applies even if the request targets other open indices. For example, a request targetingfoo*,bar*
returns an error if an index starts withfoo
but no index starts withbar
. -
expand_wildcards
string | array[string] Whether to expand wildcard expression to concrete indices that are open, closed or both.
-
search_type
string Search operation type
curl \
--request GET 'http://api.example.com/{index}/_rank_eval' \
--header "Authorization: $API_KEY" \
--header "Content-Type: application/json" \
--data '{"requests":[{"id":"string","request":{"query":{},"size":42.0},"ratings":[{"_id":"string","_index":"string","rating":42.0}],"template_id":"string","params":{"additionalProperty1":{},"additionalProperty2":{}}}],"metric":{"precision":{"k":42.0,"relevant_rating_threshold":42.0,"ignore_unlabeled":true},"recall":{"k":42.0,"relevant_rating_threshold":42.0},"mean_reciprocal_rank":{"k":42.0,"relevant_rating_threshold":42.0},"dcg":{"k":42.0,"normalize":true},"expected_reciprocal_rank":{"k":42.0,"maximum_relevance":42.0}}}'
{
"requests": [
{
"id": "string",
"request": {
"query": {},
"size": 42.0
},
"ratings": [
{
"_id": "string",
"_index": "string",
"rating": 42.0
}
],
"template_id": "string",
"params": {
"additionalProperty1": {},
"additionalProperty2": {}
}
}
],
"metric": {
"precision": {
"k": 42.0,
"relevant_rating_threshold": 42.0,
"ignore_unlabeled": true
},
"recall": {
"k": 42.0,
"relevant_rating_threshold": 42.0
},
"mean_reciprocal_rank": {
"k": 42.0,
"relevant_rating_threshold": 42.0
},
"dcg": {
"k": 42.0,
"normalize": true
},
"expected_reciprocal_rank": {
"k": 42.0,
"maximum_relevance": 42.0
}
}
}
{
"metric_score": 42.0,
"details": {
"additionalProperty1": {
"metric_score": 42.0,
"unrated_docs": [
{
"_id": "string",
"_index": "string"
}
],
"hits": [
{
"hit": {
"_id": "string",
"_index": "string",
"_score": 42.0
},
"rating": 42.0
}
],
"metric_details": {
"additionalProperty1": {
"additionalProperty1": {},
"additionalProperty2": {}
},
"additionalProperty2": {
"additionalProperty1": {},
"additionalProperty2": {}
}
}
},
"additionalProperty2": {
"metric_score": 42.0,
"unrated_docs": [
{
"_id": "string",
"_index": "string"
}
],
"hits": [
{
"hit": {
"_id": "string",
"_index": "string",
"_score": 42.0
},
"rating": 42.0
}
],
"metric_details": {
"additionalProperty1": {
"additionalProperty1": {},
"additionalProperty2": {}
},
"additionalProperty2": {
"additionalProperty1": {},
"additionalProperty2": {}
}
}
}
},
"failures": {
"additionalProperty1": {},
"additionalProperty2": {}
}
}
Clear the cache
Technical preview
Clear indices and data streams from the shared cache for partially mounted indices.
Query parameters
-
expand_wildcards
string | array[string] Whether to expand wildcard expression to concrete indices that are open, closed or both.
-
allow_no_indices
boolean Whether to ignore if a wildcard indices expression resolves into no concrete indices. (This includes
_all
string or when no indices have been specified)
curl \
--request POST 'http://api.example.com/_searchable_snapshots/cache/clear' \
--header "Authorization: $API_KEY"
{}
Create an API key
Added in 6.7.0
Create an API key for access without requiring basic authentication.
IMPORTANT: If the credential that is used to authenticate this request is an API key, the derived API key cannot have any privileges. If you specify privileges, the API returns an error.
A successful request returns a JSON structure that contains the API key, its unique id, and its name. If applicable, it also returns expiration information for the API key in milliseconds.
NOTE: By default, API keys never expire. You can specify expiration information when you create the API keys.
The API keys are created by the Elasticsearch API key service, which is automatically enabled. To configure or turn off the API key service, refer to API key service setting documentation.
Query parameters
-
refresh
string If
true
(the default) then refresh the affected shards to make this operation visible to search, ifwait_for
then wait for a refresh to make this operation visible to search, iffalse
then do nothing with refreshes.Values are
true
,false
, orwait_for
.
Body
Required
-
expiration
string A duration. Units can be
nanos
,micros
,ms
(milliseconds),s
(seconds),m
(minutes),h
(hours) andd
(days). Also accepts "0" without a unit and "-1" to indicate an unspecified value. -
name
string -
role_descriptors
object An array of role descriptors for this API key. When it is not specified or it is an empty array, the API key will have a point in time snapshot of permissions of the authenticated user. If you supply role descriptors, the resultant permissions are an intersection of API keys permissions and the authenticated user's permissions thereby limiting the access scope for API keys. The structure of role descriptor is the same as the request for the create role API. For more details, refer to the create or update roles API.
NOTE: Due to the way in which this permission intersection is calculated, it is not possible to create an API key that is a child of another API key, unless the derived key is created without any privileges. In this case, you must explicitly specify a role descriptor with no privileges. The derived API key can be used for authentication; it will not have authority to call Elasticsearch APIs.
External documentation -
metadata
object
curl \
--request PUT 'http://api.example.com/_security/api_key' \
--header "Authorization: $API_KEY" \
--header "Content-Type: application/json" \
--data '"{\n \"name\": \"my-api-key\",\n \"expiration\": \"1d\", \n \"role_descriptors\": { \n \"role-a\": {\n \"cluster\": [\"all\"],\n \"indices\": [\n {\n \"names\": [\"index-a*\"],\n \"privileges\": [\"read\"]\n }\n ]\n },\n \"role-b\": {\n \"cluster\": [\"all\"],\n \"indices\": [\n {\n \"names\": [\"index-b*\"],\n \"privileges\": [\"all\"]\n }\n ]\n }\n },\n \"metadata\": {\n \"application\": \"my-application\",\n \"environment\": {\n \"level\": 1,\n \"trusted\": true,\n \"tags\": [\"dev\", \"staging\"]\n }\n }\n}"'
{
"name": "my-api-key",
"expiration": "1d",
"role_descriptors": {
"role-a": {
"cluster": ["all"],
"indices": [
{
"names": ["index-a*"],
"privileges": ["read"]
}
]
},
"role-b": {
"cluster": ["all"],
"indices": [
{
"names": ["index-b*"],
"privileges": ["all"]
}
]
}
},
"metadata": {
"application": "my-application",
"environment": {
"level": 1,
"trusted": true,
"tags": ["dev", "staging"]
}
}
}
{
"id": "VuaCfGcBCdbkQm-e5aOx",
"name": "my-api-key",
"expiration": 1544068612110,
"api_key": "ui2lp2axTNmsyakw9tvNnw",
"encoded": "VnVhQ2ZHY0JDZGJrUW0tZTVhT3g6dWkybHAyYXhUTm1zeWFrdzl0dk5udw=="
}
Check user privileges
Added in 6.4.0
Determine whether the specified user has a specified list of privileges. All users can use this API, but only to determine their own privileges. To check the privileges of other users, you must use the run as feature.
Body
Required
-
application
array[object] -
cluster
array[string] A list of the cluster privileges that you want to check.
-
index
array[object]
curl \
--request POST 'http://api.example.com/_security/user/_has_privileges' \
--header "Authorization: $API_KEY" \
--header "Content-Type: application/json" \
--data '"{\n \"cluster\": [ \"monitor\", \"manage\" ],\n \"index\" : [\n {\n \"names\": [ \"suppliers\", \"products\" ],\n \"privileges\": [ \"read\" ]\n },\n {\n \"names\": [ \"inventory\" ],\n \"privileges\" : [ \"read\", \"write\" ]\n }\n ],\n \"application\": [\n {\n \"application\": \"inventory_manager\",\n \"privileges\" : [ \"read\", \"data:write/inventory\" ],\n \"resources\" : [ \"product/1852563\" ]\n }\n ]\n}"'
{
"cluster": [ "monitor", "manage" ],
"index" : [
{
"names": [ "suppliers", "products" ],
"privileges": [ "read" ]
},
{
"names": [ "inventory" ],
"privileges" : [ "read", "write" ]
}
],
"application": [
{
"application": "inventory_manager",
"privileges" : [ "read", "data:write/inventory" ],
"resources" : [ "product/1852563" ]
}
]
}
{
"username": "rdeniro",
"has_all_requested" : false,
"cluster" : {
"monitor" : true,
"manage" : false
},
"index" : {
"suppliers" : {
"read" : true
},
"products" : {
"read" : true
},
"inventory" : {
"read" : true,
"write" : false
}
},
"application" : {
"inventory_manager" : {
"product/1852563" : {
"read": false,
"data:write/inventory": false
}
}
}
}
Logout of SAML
Added in 7.5.0
Submits a request to invalidate an access token and refresh token.
NOTE: This API is intended for use by custom web applications other than Kibana. If you are using Kibana, refer to the documentation for configuring SAML single-sign-on on the Elastic Stack.
This API invalidates the tokens that were generated for a user by the SAML authenticate API. If the SAML realm in Elasticsearch is configured accordingly and the SAML IdP supports this, the Elasticsearch response contains a URL to redirect the user to the IdP that contains a SAML logout request (starting an SP-initiated SAML Single Logout).
Body
Required
-
token
string Required The access token that was returned as a response to calling the SAML authenticate API. Alternatively, the most recent token that was received after refreshing the original one by using a
refresh_token
. -
refresh_token
string The refresh token that was returned as a response to calling the SAML authenticate API. Alternatively, the most recent refresh token that was received after refreshing the original access token.
curl \
--request POST 'http://api.example.com/_security/saml/logout' \
--header "Authorization: $API_KEY" \
--header "Content-Type: application/json" \
--data '"{\n \"token\" : \"46ToAxZVaXVVZTVKOVF5YU04ZFJVUDVSZlV3\",\n \"refresh_token\" : \"mJdXLtmvTUSpoLwMvdBt_w\"\n}"'
{
"token" : "46ToAxZVaXVVZTVKOVF5YU04ZFJVUDVSZlV3",
"refresh_token" : "mJdXLtmvTUSpoLwMvdBt_w"
}
{
"redirect" : "https://my-idp.org/logout/SAMLRequest=...."
}
Analyze a snapshot repository
Added in 7.12.0
Analyze the performance characteristics and any incorrect behaviour found in a repository.
The response exposes implementation details of the analysis which may change from version to version. The response body format is therefore not considered stable and may be different in newer versions.
There are a large number of third-party storage systems available, not all of which are suitable for use as a snapshot repository by Elasticsearch. Some storage systems behave incorrectly, or perform poorly, especially when accessed concurrently by multiple clients as the nodes of an Elasticsearch cluster do. This API performs a collection of read and write operations on your repository which are designed to detect incorrect behaviour and to measure the performance characteristics of your storage system.
The default values for the parameters are deliberately low to reduce the impact of running an analysis inadvertently and to provide a sensible starting point for your investigations.
Run your first analysis with the default parameter values to check for simple problems.
If successful, run a sequence of increasingly large analyses until you encounter a failure or you reach a blob_count
of at least 2000
, a max_blob_size
of at least 2gb
, a max_total_data_size
of at least 1tb
, and a register_operation_count
of at least 100
.
Always specify a generous timeout, possibly 1h
or longer, to allow time for each analysis to run to completion.
Perform the analyses using a multi-node cluster of a similar size to your production cluster so that it can detect any problems that only arise when the repository is accessed by many nodes at once.
If the analysis fails, Elasticsearch detected that your repository behaved unexpectedly. This usually means you are using a third-party storage system with an incorrect or incompatible implementation of the API it claims to support. If so, this storage system is not suitable for use as a snapshot repository. You will need to work with the supplier of your storage system to address the incompatibilities that Elasticsearch detects.
If the analysis is successful, the API returns details of the testing process, optionally including how long each operation took. You can use this information to determine the performance of your storage system. If any operation fails or returns an incorrect result, the API returns an error. If the API returns an error, it may not have removed all the data it wrote to the repository. The error will indicate the location of any leftover data and this path is also recorded in the Elasticsearch logs. You should verify that this location has been cleaned up correctly. If there is still leftover data at the specified location, you should manually remove it.
If the connection from your client to Elasticsearch is closed while the client is waiting for the result of the analysis, the test is cancelled. Some clients are configured to close their connection if no response is received within a certain timeout. An analysis takes a long time to complete so you might need to relax any such client-side timeouts. On cancellation the analysis attempts to clean up the data it was writing, but it may not be able to remove it all. The path to the leftover data is recorded in the Elasticsearch logs. You should verify that this location has been cleaned up correctly. If there is still leftover data at the specified location, you should manually remove it.
If the analysis is successful then it detected no incorrect behaviour, but this does not mean that correct behaviour is guaranteed. The analysis attempts to detect common bugs but it does not offer 100% coverage. Additionally, it does not test the following:
- Your repository must perform durable writes. Once a blob has been written it must remain in place until it is deleted, even after a power loss or similar disaster.
- Your repository must not suffer from silent data corruption. Once a blob has been written, its contents must remain unchanged until it is deliberately modified or deleted.
- Your repository must behave correctly even if connectivity from the cluster is disrupted. Reads and writes may fail in this case, but they must not return incorrect results.
IMPORTANT: An analysis writes a substantial amount of data to your repository and then reads it back again.
This consumes bandwidth on the network between the cluster and the repository, and storage space and I/O bandwidth on the repository itself.
You must ensure this load does not affect other users of these systems.
Analyses respect the repository settings max_snapshot_bytes_per_sec
and max_restore_bytes_per_sec
if available and the cluster setting indices.recovery.max_bytes_per_sec
which you can use to limit the bandwidth they consume.
NOTE: This API is intended for exploratory use by humans. You should expect the request parameters and the response format to vary in future versions.
NOTE: Different versions of Elasticsearch may perform different checks for repository compatibility, with newer versions typically being stricter than older ones. A storage system that passes repository analysis with one version of Elasticsearch may fail with a different version. This indicates it behaves incorrectly in ways that the former version did not detect. You must work with the supplier of your storage system to address the incompatibilities detected by the repository analysis API in any version of Elasticsearch.
NOTE: This API may not work correctly in a mixed-version cluster.
Implementation details
NOTE: This section of documentation describes how the repository analysis API works in this version of Elasticsearch, but you should expect the implementation to vary between versions. The request parameters and response format depend on details of the implementation so may also be different in newer versions.
The analysis comprises a number of blob-level tasks, as set by the blob_count
parameter and a number of compare-and-exchange operations on linearizable registers, as set by the register_operation_count
parameter.
These tasks are distributed over the data and master-eligible nodes in the cluster for execution.
For most blob-level tasks, the executing node first writes a blob to the repository and then instructs some of the other nodes in the cluster to attempt to read the data it just wrote.
The size of the blob is chosen randomly, according to the max_blob_size
and max_total_data_size
parameters.
If any of these reads fails then the repository does not implement the necessary read-after-write semantics that Elasticsearch requires.
For some blob-level tasks, the executing node will instruct some of its peers to attempt to read the data before the writing process completes. These reads are permitted to fail, but must not return partial data. If any read returns partial data then the repository does not implement the necessary atomicity semantics that Elasticsearch requires.
For some blob-level tasks, the executing node will overwrite the blob while its peers are reading it. In this case the data read may come from either the original or the overwritten blob, but the read operation must not return partial data or a mix of data from the two blobs. If any of these reads returns partial data or a mix of the two blobs then the repository does not implement the necessary atomicity semantics that Elasticsearch requires for overwrites.
The executing node will use a variety of different methods to write the blob. For instance, where applicable, it will use both single-part and multi-part uploads. Similarly, the reading nodes will use a variety of different methods to read the data back again. For instance they may read the entire blob from start to end or may read only a subset of the data.
For some blob-level tasks, the executing node will cancel the write before it is complete. In this case, it still instructs some of the other nodes in the cluster to attempt to read the blob but all of these reads must fail to find the blob.
Linearizable registers are special blobs that Elasticsearch manipulates using an atomic compare-and-exchange operation. This operation ensures correct and strongly-consistent behavior even when the blob is accessed by multiple nodes at the same time. The detailed implementation of the compare-and-exchange operation on linearizable registers varies by repository type. Repository analysis verifies that that uncontended compare-and-exchange operations on a linearizable register blob always succeed. Repository analysis also verifies that contended operations either succeed or report the contention but do not return incorrect results. If an operation fails due to contention, Elasticsearch retries the operation until it succeeds. Most of the compare-and-exchange operations performed by repository analysis atomically increment a counter which is represented as an 8-byte blob. Some operations also verify the behavior on small blobs with sizes other than 8 bytes.
Path parameters
-
repository
string Required The name of the repository.
Query parameters
-
blob_count
number The total number of blobs to write to the repository during the test. For realistic experiments, you should set it to at least
2000
. -
concurrency
number The number of operations to run concurrently during the test.
-
detailed
boolean Indicates whether to return detailed results, including timing information for every operation performed during the analysis. If false, it returns only a summary of the analysis.
-
early_read_node_count
number The number of nodes on which to perform an early read operation while writing each blob. Early read operations are only rarely performed.
-
max_blob_size
number | string The maximum size of a blob to be written during the test. For realistic experiments, you should set it to at least
2gb
. -
max_total_data_size
number | string An upper limit on the total size of all the blobs written during the test. For realistic experiments, you should set it to at least
1tb
. -
rare_action_probability
number The probability of performing a rare action such as an early read, an overwrite, or an aborted write on each blob.
-
rarely_abort_writes
boolean Indicates whether to rarely cancel writes before they complete.
-
read_node_count
number The number of nodes on which to read a blob after writing.
-
register_operation_count
number The minimum number of linearizable register operations to perform in total. For realistic experiments, you should set it to at least
100
. -
seed
number The seed for the pseudo-random number generator used to generate the list of operations performed during the test. To repeat the same set of operations in multiple experiments, use the same seed in each experiment. Note that the operations are performed concurrently so might not always happen in the same order on each run.
-
timeout
string The period of time to wait for the test to complete. If no response is received before the timeout expires, the test is cancelled and returns an error.
curl \
--request POST 'http://api.example.com/_snapshot/{repository}/_analyze' \
--header "Authorization: $API_KEY"
{
"blob_count": 42.0,
"blob_path": "string",
"concurrency": 42.0,
"coordinating_node": {
"id": "string",
"name": "string"
},
"delete_elapsed": "string",
"": 42.0,
"details": {
"blob": {
"name": "string",
"overwritten": true,
"read_early": true,
"read_end": 42.0,
"read_start": 42.0,
"reads": {
"before_write_complete": true,
"elapsed": "string",
"": 42.0,
"first_byte_time": "string",
"found": true,
"node": {
"id": "string",
"name": "string"
},
"throttled": "string"
},
"": 42.0,
"size_bytes": 42.0
},
"overwrite_elapsed": "string",
"": 42.0,
"write_elapsed": "string",
"write_throttled": "string",
"writer_node": {
"id": "string",
"name": "string"
}
},
"early_read_node_count": 42.0,
"issues_detected": [
"string"
],
"listing_elapsed": "string",
"max_blob_size_bytes": 42.0,
"max_total_data_size_bytes": 42.0,
"rare_action_probability": 42.0,
"read_node_count": 42.0,
"repository": "string",
"seed": 42.0,
"summary": {
"read": {
"count": 42.0,
"max_wait": "string",
"": 42.0,
"total_elapsed": "string",
"total_size_bytes": 42.0,
"total_throttled": "string",
"total_wait": "string"
},
"write": {
"count": 42.0,
"total_elapsed": "string",
"": 42.0,
"total_size_bytes": 42.0,
"total_throttled": "string",
"total_throttled_nanos": 42.0
}
}
}
Get the snapshot status
Added in 7.8.0
Get a detailed description of the current state for each shard participating in the snapshot.
Note that this API should be used only to obtain detailed shard-level information for ongoing snapshots. If this detail is not needed or you want to obtain information about one or more existing snapshots, use the get snapshot API.
If you omit the <snapshot>
request path parameter, the request retrieves information only for currently running snapshots.
This usage is preferred.
If needed, you can specify <repository>
and <snapshot>
to retrieve information for specific snapshots, even if they're not currently running.
WARNING: Using the API to return the status of any snapshots other than currently running snapshots can be expensive. The API requires a read from the repository for each shard in each snapshot. For example, if you have 100 snapshots with 1,000 shards each, an API request that includes all snapshots will require 100,000 reads (100 snapshots x 1,000 shards).
Depending on the latency of your storage, such requests can take an extremely long time to return results. These requests can also tax machine resources and, when using cloud storage, incur high processing costs.
Path parameters
-
repository
string Required The snapshot repository name used to limit the request. It supports wildcards (
*
) if<snapshot>
isn't specified.
Query parameters
-
master_timeout
string The period to wait for the master node. If the master node is not available before the timeout expires, the request fails and returns an error. To indicate that the request should never timeout, set it to
-1
.
curl \
--request GET 'http://api.example.com/_snapshot/{repository}/_status' \
--header "Authorization: $API_KEY"
{
"snapshots" : [
{
"snapshot" : "snapshot_2",
"repository" : "my_repository",
"uuid" : "lNeQD1SvTQCqqJUMQSwmGg",
"state" : "SUCCESS",
"include_global_state" : false,
"shards_stats" : {
"initializing" : 0,
"started" : 0,
"finalizing" : 0,
"done" : 1,
"failed" : 0,
"total" : 1
},
"stats" : {
"incremental" : {
"file_count" : 3,
"size_in_bytes" : 5969
},
"total" : {
"file_count" : 4,
"size_in_bytes" : 6024
},
"start_time_in_millis" : 1594829326691,
"time_in_millis" : 205
},
"indices" : {
"index_1" : {
"shards_stats" : {
"initializing" : 0,
"started" : 0,
"finalizing" : 0,
"done" : 1,
"failed" : 0,
"total" : 1
},
"stats" : {
"incremental" : {
"file_count" : 3,
"size_in_bytes" : 5969
},
"total" : {
"file_count" : 4,
"size_in_bytes" : 6024
},
"start_time_in_millis" : 1594829326896,
"time_in_millis" : 0
},
"shards" : {
"0" : {
"stage" : "DONE",
"stats" : {
"incremental" : {
"file_count" : 3,
"size_in_bytes" : 5969
},
"total" : {
"file_count" : 4,
"size_in_bytes" : 6024
},
"start_time_in_millis" : 1594829326896,
"time_in_millis" : 0
}
}
}
}
}
}
]
}
curl \
--request POST 'http://api.example.com/_sql/close' \
--header "Authorization: $API_KEY" \
--header "Content-Type: application/json" \
--data '"{\n \"cursor\": \"sDXF1ZXJ5QW5kRmV0Y2gBAAAAAAAAAAEWYUpOYklQMHhRUEtld3RsNnFtYU1hQQ==:BAFmBGRhdGUBZgVsaWtlcwFzB21lc3NhZ2UBZgR1c2Vy9f///w8=\"\n}"'
{
"cursor": "sDXF1ZXJ5QW5kRmV0Y2gBAAAAAAAAAAEWYUpOYklQMHhRUEtld3RsNnFtYU1hQQ==:BAFmBGRhdGUBZgVsaWtlcwFzB21lc3NhZ2UBZgR1c2Vy9f///w8="
}
{
"succeeded": true
}
Path parameters
-
id
string Required The synonyms set identifier to retrieve.
curl \
--request GET 'http://api.example.com/_synonyms/{id}' \
--header "Authorization: $API_KEY"
{
"count": 3,
"synonyms_set": [
{
"id": "test-1",
"synonyms": "hello, hi"
},
{
"id": "test-2",
"synonyms": "bye, goodbye"
},
{
"id": "test-3",
"synonyms": "test => check"
}
]
}
Acknowledge a watch
Acknowledging a watch enables you to manually throttle the execution of the watch's actions.
The acknowledgement state of an action is stored in the status.actions.<id>.ack.state
structure.
IMPORTANT: If the specified watch is currently being executed, this API will return an error The reason for this behavior is to prevent overwriting the watch status from a watch execution.
Acknowledging an action throttles further executions of that action until its ack.state
is reset to awaits_successful_execution
.
This happens when the condition of the watch is not met (the condition evaluates to false).
curl \
--request PUT 'http://api.example.com/_watcher/watch/{watch_id}/_ack/{action_id}' \
--header "Authorization: $API_KEY"
{
"status": {
"state": {
"active": true,
"timestamp": "2015-05-26T18:04:27.723Z"
},
"last_checked": "2015-05-26T18:04:27.753Z",
"last_met_condition": "2015-05-26T18:04:27.763Z",
"actions": {
"test_index": {
"ack" : {
"timestamp": "2015-05-26T18:04:27.713Z",
"state": "acked"
},
"last_execution" : {
"timestamp": "2015-05-25T18:04:27.733Z",
"successful": true
},
"last_successful_execution" : {
"timestamp": "2015-05-25T18:04:27.773Z",
"successful": true
}
}
},
"execution_state": "executed",
"version": 2
}
}
Activate a watch
A watch can be either active or inactive.
Path parameters
-
watch_id
string Required The watch identifier.
curl \
--request PUT 'http://api.example.com/_watcher/watch/{watch_id}/_activate' \
--header "Authorization: $API_KEY"
{
"status": {
"actions": {
"additionalProperty1": {
"ack": {
"state": "awaits_successful_execution",
"": "string"
},
"last_execution": {
"successful": true,
"": "string",
"reason": "string"
},
"last_successful_execution": {
"successful": true,
"": "string",
"reason": "string"
},
"last_throttle": {
"reason": "string",
"": "string"
}
},
"additionalProperty2": {
"ack": {
"state": "awaits_successful_execution",
"": "string"
},
"last_execution": {
"successful": true,
"": "string",
"reason": "string"
},
"last_successful_execution": {
"successful": true,
"": "string",
"reason": "string"
},
"last_throttle": {
"reason": "string",
"": "string"
}
}
},
"state": {
"active": true,
"": "string"
},
"version": 42.0
}
}