- Elasticsearch Guide: other versions:
- What is Elasticsearch?
- What’s new in 7.17
- Quick start
- Set up Elasticsearch
- Installing Elasticsearch
- Configuring Elasticsearch
- Important Elasticsearch configuration
- Secure settings
- Auditing settings
- Circuit breaker settings
- Cluster-level shard allocation and routing settings
- Cross-cluster replication settings
- Discovery and cluster formation settings
- Field data cache settings
- Index lifecycle management settings
- Index management settings
- Index recovery settings
- Indexing buffer settings
- License settings
- Local gateway settings
- Logging
- Machine learning settings
- Monitoring settings
- Node
- Networking
- Node query cache settings
- Search settings
- Security settings
- Shard request cache settings
- Snapshot and restore settings
- Transforms settings
- Thread pools
- Watcher settings
- Advanced configuration
- Important system configuration
- Bootstrap Checks
- Heap size check
- File descriptor check
- Memory lock check
- Maximum number of threads check
- Max file size check
- Maximum size virtual memory check
- Maximum map count check
- Client JVM check
- Use serial collector check
- System call filter check
- OnError and OnOutOfMemoryError checks
- Early-access check
- G1GC check
- All permission check
- Discovery configuration check
- Bootstrap Checks for X-Pack
- Starting Elasticsearch
- Stopping Elasticsearch
- Discovery and cluster formation
- Add and remove nodes in your cluster
- Full-cluster restart and rolling restart
- Remote clusters
- Set up X-Pack
- Configuring X-Pack Java Clients
- Plugins
- Upgrade Elasticsearch
- Index modules
- Mapping
- Text analysis
- Overview
- Concepts
- Configure text analysis
- Built-in analyzer reference
- Tokenizer reference
- Token filter reference
- Apostrophe
- ASCII folding
- CJK bigram
- CJK width
- Classic
- Common grams
- Conditional
- Decimal digit
- Delimited payload
- Dictionary decompounder
- Edge n-gram
- Elision
- Fingerprint
- Flatten graph
- Hunspell
- Hyphenation decompounder
- Keep types
- Keep words
- Keyword marker
- Keyword repeat
- KStem
- Length
- Limit token count
- Lowercase
- MinHash
- Multiplexer
- N-gram
- Normalization
- Pattern capture
- Pattern replace
- Phonetic
- Porter stem
- Predicate script
- Remove duplicates
- Reverse
- Shingle
- Snowball
- Stemmer
- Stemmer override
- Stop
- Synonym
- Synonym graph
- Trim
- Truncate
- Unique
- Uppercase
- Word delimiter
- Word delimiter graph
- Character filters reference
- Normalizers
- Index templates
- Data streams
- Ingest pipelines
- Example: Parse logs
- Enrich your data
- Processor reference
- Append
- Bytes
- Circle
- Community ID
- Convert
- CSV
- Date
- Date index name
- Dissect
- Dot expander
- Drop
- Enrich
- Fail
- Fingerprint
- Foreach
- GeoIP
- Grok
- Gsub
- HTML strip
- Inference
- Join
- JSON
- KV
- Lowercase
- Network direction
- Pipeline
- Registered domain
- Remove
- Rename
- Script
- Set
- Set security user
- Sort
- Split
- Trim
- Uppercase
- URL decode
- URI parts
- User agent
- Aliases
- Search your data
- Query DSL
- Aggregations
- Bucket aggregations
- Adjacency matrix
- Auto-interval date histogram
- Categorize text
- Children
- Composite
- Date histogram
- Date range
- Diversified sampler
- Filter
- Filters
- Geo-distance
- Geohash grid
- Geotile grid
- Global
- Histogram
- IP range
- Missing
- Multi Terms
- Nested
- Parent
- Range
- Rare terms
- Reverse nested
- Sampler
- Significant terms
- Significant text
- Terms
- Variable width histogram
- Subtleties of bucketing range fields
- Metrics aggregations
- Pipeline aggregations
- Average bucket
- Bucket script
- Bucket count K-S test
- Bucket correlation
- Bucket selector
- Bucket sort
- Cumulative cardinality
- Cumulative sum
- Derivative
- Extended stats bucket
- Inference bucket
- Max bucket
- Min bucket
- Moving average
- Moving function
- Moving percentiles
- Normalize
- Percentiles bucket
- Serial differencing
- Stats bucket
- Sum bucket
- Bucket aggregations
- EQL
- SQL
- Overview
- Getting Started with SQL
- Conventions and Terminology
- Security
- SQL REST API
- SQL Translate API
- SQL CLI
- SQL JDBC
- SQL ODBC
- SQL Client Applications
- SQL Language
- Functions and Operators
- Comparison Operators
- Logical Operators
- Math Operators
- Cast Operators
- LIKE and RLIKE Operators
- Aggregate Functions
- Grouping Functions
- Date/Time and Interval Functions and Operators
- Full-Text Search Functions
- Mathematical Functions
- String Functions
- Type Conversion Functions
- Geo Functions
- Conditional Functions And Expressions
- System Functions
- Reserved keywords
- SQL Limitations
- Scripting
- Data management
- ILM: Manage the index lifecycle
- Overview
- Concepts
- Automate rollover
- Customize built-in ILM policies
- Index lifecycle actions
- Configure a lifecycle policy
- Migrate index allocation filters to node roles
- Troubleshooting index lifecycle management errors
- Start and stop index lifecycle management
- Manage existing indices
- Skip rollover
- Restore a managed data stream or index
- Autoscaling
- Monitor a cluster
- Roll up or transform your data
- Set up a cluster for high availability
- Snapshot and restore
- Secure the Elastic Stack
- Elasticsearch security principles
- Configuring security
- Updating node security certificates
- User authentication
- Built-in users
- Service accounts
- Internal users
- Token-based authentication services
- Realms
- Realm chains
- Active Directory user authentication
- File-based user authentication
- LDAP user authentication
- Native user authentication
- OpenID Connect authentication
- PKI user authentication
- SAML authentication
- Kerberos authentication
- Integrating with other authentication systems
- Enabling anonymous access
- Controlling the user cache
- Configuring SAML single-sign-on on the Elastic Stack
- Configuring single sign-on to the Elastic Stack using OpenID Connect
- User authorization
- Built-in roles
- Defining roles
- Security privileges
- Document level security
- Field level security
- Granting privileges for data streams and aliases
- Mapping users and groups to roles
- Setting up field and document level security
- Submitting requests on behalf of other users
- Configuring authorization delegation
- Customizing roles and authorization
- Enable audit logging
- Restricting connections with IP filtering
- Securing clients and integrations
- Operator privileges
- Troubleshooting
- Some settings are not returned via the nodes settings API
- Authorization exceptions
- Users command fails due to extra arguments
- Users are frequently locked out of Active Directory
- Certificate verification fails for curl on Mac
- SSLHandshakeException causes connections to fail
- Common SSL/TLS exceptions
- Common Kerberos exceptions
- Common SAML issues
- Internal Server Error in Kibana
- Setup-passwords command fails due to connection failure
- Failures due to relocation of the configuration files
- Limitations
- Watcher
- Command line tools
- How to
- REST APIs
- API conventions
- Autoscaling APIs
- Compact and aligned text (CAT) APIs
- cat aliases
- cat allocation
- cat anomaly detectors
- cat count
- cat data frame analytics
- cat datafeeds
- cat fielddata
- cat health
- cat indices
- cat master
- cat nodeattrs
- cat nodes
- cat pending tasks
- cat plugins
- cat recovery
- cat repositories
- cat segments
- cat shards
- cat snapshots
- cat task management
- cat templates
- cat thread pool
- cat trained model
- cat transforms
- Cluster APIs
- Cluster allocation explain
- Cluster get settings
- Cluster health
- Cluster reroute
- Cluster state
- Cluster stats
- Cluster update settings
- Nodes feature usage
- Nodes hot threads
- Nodes info
- Nodes reload secure settings
- Nodes stats
- Pending cluster tasks
- Remote cluster info
- Task management
- Voting configuration exclusions
- Cross-cluster replication APIs
- Data stream APIs
- Document APIs
- Enrich APIs
- EQL APIs
- Features APIs
- Fleet APIs
- Find structure API
- Graph explore API
- Index APIs
- Alias exists
- Aliases
- Analyze
- Analyze index disk usage
- Clear cache
- Clone index
- Close index
- Create index
- Create or update alias
- Create or update component template
- Create or update index template
- Create or update index template (legacy)
- Delete component template
- Delete dangling index
- Delete alias
- Delete index
- Delete index template
- Delete index template (legacy)
- Exists
- Field usage stats
- Flush
- Force merge
- Freeze index
- Get alias
- Get component template
- Get field mapping
- Get index
- Get index settings
- Get index template
- Get index template (legacy)
- Get mapping
- Import dangling index
- Index recovery
- Index segments
- Index shard stores
- Index stats
- Index template exists (legacy)
- List dangling indices
- Open index
- Refresh
- Resolve index
- Rollover
- Shrink index
- Simulate index
- Simulate template
- Split index
- Synced flush
- Type exists
- Unfreeze index
- Update index settings
- Update mapping
- Index lifecycle management APIs
- Create or update lifecycle policy
- Get policy
- Delete policy
- Move to step
- Remove policy
- Retry policy
- Get index lifecycle management status
- Explain lifecycle
- Start index lifecycle management
- Stop index lifecycle management
- Migrate indices, ILM policies, and legacy, composable and component templates to data tiers routing
- Ingest APIs
- Info API
- Licensing APIs
- Logstash APIs
- Machine learning anomaly detection APIs
- Add events to calendar
- Add jobs to calendar
- Close jobs
- Create jobs
- Create calendars
- Create datafeeds
- Create filters
- Delete calendars
- Delete datafeeds
- Delete events from calendar
- Delete filters
- Delete forecasts
- Delete jobs
- Delete jobs from calendar
- Delete model snapshots
- Delete expired data
- Estimate model memory
- Find file structure
- Flush jobs
- Forecast jobs
- Get buckets
- Get calendars
- Get categories
- Get datafeeds
- Get datafeed statistics
- Get influencers
- Get jobs
- Get job statistics
- Get machine learning info
- Get model snapshots
- Get model snapshot upgrade statistics
- Get overall buckets
- Get scheduled events
- Get filters
- Get records
- Open jobs
- Post data to jobs
- Preview datafeeds
- Reset jobs
- Revert model snapshots
- Set upgrade mode
- Start datafeeds
- Stop datafeeds
- Update datafeeds
- Update filters
- Update jobs
- Update model snapshots
- Upgrade model snapshots
- Machine learning data frame analytics APIs
- Create data frame analytics jobs
- Delete data frame analytics jobs
- Evaluate data frame analytics
- Explain data frame analytics
- Get data frame analytics jobs
- Get data frame analytics jobs stats
- Preview data frame analytics
- Start data frame analytics jobs
- Stop data frame analytics jobs
- Update data frame analytics jobs
- Machine learning trained model APIs
- Migration APIs
- Node lifecycle APIs
- Reload search analyzers API
- Repositories metering APIs
- Rollup APIs
- Script APIs
- Search APIs
- Searchable snapshots APIs
- Security APIs
- Authenticate
- Change passwords
- Clear cache
- Clear roles cache
- Clear privileges cache
- Clear API key cache
- Clear service account token caches
- Create API keys
- Create or update application privileges
- Create or update role mappings
- Create or update roles
- Create or update users
- Create service account tokens
- Delegate PKI authentication
- Delete application privileges
- Delete role mappings
- Delete roles
- Delete service account token
- Delete users
- Disable users
- Enable users
- Get API key information
- Get application privileges
- Get builtin privileges
- Get role mappings
- Get roles
- Get service accounts
- Get service account credentials
- Get token
- Get user privileges
- Get users
- Grant API keys
- Has privileges
- Invalidate API key
- Invalidate token
- OpenID Connect prepare authentication
- OpenID Connect authenticate
- OpenID Connect logout
- Query API key information
- SAML prepare authentication
- SAML authenticate
- SAML logout
- SAML invalidate
- SAML complete logout
- SAML service provider metadata
- SSL certificate
- Snapshot and restore APIs
- Snapshot lifecycle management APIs
- SQL APIs
- Transform APIs
- Usage API
- Watcher APIs
- Definitions
- Migration guide
- Release notes
- Elasticsearch version 7.17.27
- Elasticsearch version 7.17.26
- Elasticsearch version 7.17.25
- Elasticsearch version 7.17.24
- Elasticsearch version 7.17.23
- Elasticsearch version 7.17.22
- Elasticsearch version 7.17.21
- Elasticsearch version 7.17.20
- Elasticsearch version 7.17.19
- Elasticsearch version 7.17.18
- Elasticsearch version 7.17.17
- Elasticsearch version 7.17.16
- Elasticsearch version 7.17.15
- Elasticsearch version 7.17.14
- Elasticsearch version 7.17.13
- Elasticsearch version 7.17.12
- Elasticsearch version 7.17.11
- Elasticsearch version 7.17.10
- Elasticsearch version 7.17.9
- Elasticsearch version 7.17.8
- Elasticsearch version 7.17.7
- Elasticsearch version 7.17.6
- Elasticsearch version 7.17.5
- Elasticsearch version 7.17.4
- Elasticsearch version 7.17.3
- Elasticsearch version 7.17.2
- Elasticsearch version 7.17.1
- Elasticsearch version 7.17.0
- Elasticsearch version 7.16.3
- Elasticsearch version 7.16.2
- Elasticsearch version 7.16.1
- Elasticsearch version 7.16.0
- Elasticsearch version 7.15.2
- Elasticsearch version 7.15.1
- Elasticsearch version 7.15.0
- Elasticsearch version 7.14.2
- Elasticsearch version 7.14.1
- Elasticsearch version 7.14.0
- Elasticsearch version 7.13.4
- Elasticsearch version 7.13.3
- Elasticsearch version 7.13.2
- Elasticsearch version 7.13.1
- Elasticsearch version 7.13.0
- Elasticsearch version 7.12.1
- Elasticsearch version 7.12.0
- Elasticsearch version 7.11.2
- Elasticsearch version 7.11.1
- Elasticsearch version 7.11.0
- Elasticsearch version 7.10.2
- Elasticsearch version 7.10.1
- Elasticsearch version 7.10.0
- Elasticsearch version 7.9.3
- Elasticsearch version 7.9.2
- Elasticsearch version 7.9.1
- Elasticsearch version 7.9.0
- Elasticsearch version 7.8.1
- Elasticsearch version 7.8.0
- Elasticsearch version 7.7.1
- Elasticsearch version 7.7.0
- Elasticsearch version 7.6.2
- Elasticsearch version 7.6.1
- Elasticsearch version 7.6.0
- Elasticsearch version 7.5.2
- Elasticsearch version 7.5.1
- Elasticsearch version 7.5.0
- Elasticsearch version 7.4.2
- Elasticsearch version 7.4.1
- Elasticsearch version 7.4.0
- Elasticsearch version 7.3.2
- Elasticsearch version 7.3.1
- Elasticsearch version 7.3.0
- Elasticsearch version 7.2.1
- Elasticsearch version 7.2.0
- Elasticsearch version 7.1.1
- Elasticsearch version 7.1.0
- Elasticsearch version 7.0.0
- Elasticsearch version 7.0.0-rc2
- Elasticsearch version 7.0.0-rc1
- Elasticsearch version 7.0.0-beta1
- Elasticsearch version 7.0.0-alpha2
- Elasticsearch version 7.0.0-alpha1
- Dependencies and versions
Index API
editIndex API
editAdds a JSON document to the specified data stream or index and makes it searchable. If the target is an index and the document already exists, the request updates the document and increments its version.
You cannot use the index API to send update requests for existing documents to a data stream. See Update documents in a data stream by query and Update or delete documents in a backing index.
Request
editPUT /<target>/_doc/<_id>
POST /<target>/_doc/
PUT /<target>/_create/<_id>
POST /<target>/_create/<_id>
You cannot add new documents to a data stream using the
PUT /<target>/_doc/<_id>
request format. To specify a document ID, use the
PUT /<target>/_create/<_id>
format instead. See
Add documents to a data stream.
Prerequisites
edit-
If the Elasticsearch security features are enabled, you must have the following index privileges for the target data stream, index, or index alias:
-
To add or overwrite a document using the
PUT /<target>/_doc/<_id>
request format, you must have thecreate
,index
, orwrite
index privilege. -
To add a document using the
POST /<target>/_doc/
,PUT /<target>/_create/<_id>
, orPOST /<target>/_create/<_id>
request formats, you must have thecreate_doc
,create
,index
, orwrite
index privilege. -
To automatically create a data stream or index with an index API request, you
must have the
auto_configure
,create_index
, ormanage
index privilege.
-
To add or overwrite a document using the
- Automatic data stream creation requires a matching index template with data stream enabled. See Set up a data stream.
Path parameters
edit-
<target>
-
(Required, string) Name of the data stream or index to target.
If the target doesn’t exist and matches the name or wildcard (
*
) pattern of an index template with adata_stream
definition, this request creates the data stream. See Set up a data stream.If the target doesn’t exist and doesn’t match a data stream template, this request creates the index.
You can check for existing targets using the resolve index API.
-
<_id>
-
(Optional, string) Unique identifier for the document.
This parameter is required for the following request formats:
-
PUT /<target>/_doc/<_id>
-
PUT /<target>/_create/<_id>
-
POST /<target>/_create/<_id>
To automatically generate a document ID, use the
POST /<target>/_doc/
request format and omit this parameter. -
Query parameters
edit-
if_seq_no
- (Optional, integer) Only perform the operation if the document has this sequence number. See Optimistic concurrency control.
-
if_primary_term
- (Optional, integer) Only perform the operation if the document has this primary term. See Optimistic concurrency control.
-
op_type
-
(Optional, enum) Set to
create
to only index the document if it does not already exist (put if absent). If a document with the specified_id
already exists, the indexing operation will fail. Same as using the<index>/_create
endpoint. Valid values:index
,create
. If document id is specified, it defaults toindex
. Otherwise, it defaults tocreate
.If the request targets a data stream, an
op_type
ofcreate
is required. See Add documents to a data stream. -
pipeline
- (Optional, string) ID of the pipeline to use to preprocess incoming documents.
-
refresh
-
(Optional, enum) If
true
, Elasticsearch refreshes the affected shards to make this operation visible to search, ifwait_for
then wait for a refresh to make this operation visible to search, iffalse
do nothing with refreshes. Valid values:true
,false
,wait_for
. Default:false
. -
routing
- (Optional, string) Custom value used to route operations to a specific shard.
-
timeout
-
(Optional, time units) Period the request waits for the following operations:
Defaults to
1m
(one minute). This guarantees Elasticsearch waits for at least the timeout before failing. The actual wait time could be longer, particularly when multiple waits occur. -
version
- (Optional, integer) Explicit version number for concurrency control. The specified version must match the current version of the document for the request to succeed.
-
version_type
-
(Optional, enum) Specific version type:
external
,external_gte
. -
wait_for_active_shards
-
(Optional, string) The number of shard copies that must be active before proceeding with the operation. Set to
all
or any positive integer up to the total number of shards in the index (number_of_replicas+1
). Default: 1, the primary shard.See Active shards.
-
require_alias
-
(Optional, Boolean) If
true
, the destination must be an index alias. Defaults tofalse
.
Request body
edit-
<field>
- (Required, string) Request body contains the JSON source for the document data.
Response body
edit-
_shards
- Provides information about the replication process of the index operation.
-
_shards.total
- Indicates how many shard copies (primary and replica shards) the index operation should be executed on.
-
_shards.successful
-
Indicates the number of shard copies the index operation succeeded on. When the index operation is successful,
successful
is at least 1.Replica shards might not all be started when an indexing operation returns successfully—by default, only the primary is required. Set
wait_for_active_shards
to change this default behavior. See Active shards. -
_shards.failed
- An array that contains replication-related errors in the case an index operation failed on a replica shard. 0 indicates there were no failures.
-
_index
- The name of the index the document was added to.
-
_type
-
The document type. Elasticsearch indices now support a single document type,
_doc
. -
_id
- The unique identifier for the added document.
-
_version
- The document version. Incremented each time the document is updated.
-
_seq_no
- The sequence number assigned to the document for the indexing operation. Sequence numbers are used to ensure an older version of a document doesn’t overwrite a newer version. See Optimistic concurrency control.
-
_primary_term
- The primary term assigned to the document for the indexing operation. See Optimistic concurrency control.
-
result
-
The result of the indexing operation,
created
orupdated
.
Description
editYou can index a new JSON document with the _doc
or _create
resource. Using
_create
guarantees that the document is only indexed if it does not already
exist. To update an existing document, you must use the _doc
resource.
Automatically create data streams and indices
editIf request’s target doesn’t exist and matches an
index template with a data_stream
definition, the index operation automatically creates the data stream. See
Set up a data stream.
If the target doesn’t exist and doesn’t match a data stream template, the operation automatically creates the index and applies any matching index templates.
Elasticsearch includes several built-in index templates. To avoid naming collisions with these templates, see Avoid index pattern collisions.
If no mapping exists, the index operation creates a dynamic mapping. By default, new fields and objects are automatically added to the mapping if needed. For more information about field mapping, see mapping and the update mapping API.
Automatic index creation is controlled by the action.auto_create_index
setting. This setting defaults to true
, which allows any index to be created
automatically. You can modify this setting to explicitly allow or block
automatic creation of indices that match specified patterns, or set it to
false
to disable automatic index creation entirely. Specify a
comma-separated list of patterns you want to allow, or prefix each pattern with
+
or -
to indicate whether it should be allowed or blocked. When a list is
specified, the default behaviour is to disallow.
The action.auto_create_index
setting only affects the automatic
creation of indices. It does not affect the creation of data streams.
PUT _cluster/settings { "persistent": { "action.auto_create_index": "my-index-000001,index10,-index1*,+ind*" } } PUT _cluster/settings { "persistent": { "action.auto_create_index": "false" } } PUT _cluster/settings { "persistent": { "action.auto_create_index": "true" } }
Allow auto-creation of indices called |
|
Disable automatic index creation entirely. |
|
Allow automatic creation of any index. This is the default. |
Put if absent
editYou can force a create operation by using the _create
resource or
setting the op_type
parameter to create. In this case,
the index operation fails if a document with the specified ID
already exists in the index.
Create document IDs automatically
editWhen using the POST /<target>/_doc/
request format, the op_type
is
automatically set to create
and the index operation generates a unique ID for
the document.
POST my-index-000001/_doc/ { "@timestamp": "2099-11-15T13:12:00", "message": "GET /search HTTP/1.1 200 1070000", "user": { "id": "kimchy" } }
The API returns the following result:
{ "_shards": { "total": 2, "failed": 0, "successful": 2 }, "_index": "my-index-000001", "_type": "_doc", "_id": "W0tpsmIBdwcYyG50zbta", "_version": 1, "_seq_no": 0, "_primary_term": 1, "result": "created" }
Optimistic concurrency control
editIndex operations can be made conditional and only be performed if the last
modification to the document was assigned the sequence number and primary
term specified by the if_seq_no
and if_primary_term
parameters. If a
mismatch is detected, the operation will result in a VersionConflictException
and a status code of 409. See Optimistic concurrency control for more details.
Routing
editBy default, shard placement — or routing
— is controlled by using a
hash of the document’s id value. For more explicit control, the value
fed into the hash function used by the router can be directly specified
on a per-operation basis using the routing
parameter. For example:
POST my-index-000001/_doc?routing=kimchy { "@timestamp": "2099-11-15T13:12:00", "message": "GET /search HTTP/1.1 200 1070000", "user": { "id": "kimchy" } }
In this example, the document is routed to a shard based on
the routing
parameter provided: "kimchy".
When setting up explicit mapping, you can also use the _routing
field
to direct the index operation to extract the routing value from the
document itself. This does come at the (very minimal) cost of an
additional document parsing pass. If the _routing
mapping is defined
and set to be required
, the index operation will fail if no routing
value is provided or extracted.
Data streams do not support custom routing. Instead, target the appropriate backing index for the stream.
Distributed
editThe index operation is directed to the primary shard based on its route (see the Routing section above) and performed on the actual node containing this shard. After the primary shard completes the operation, if needed, the update is distributed to applicable replicas.
Active shards
editTo improve the resiliency of writes to the system, indexing operations
can be configured to wait for a certain number of active shard copies
before proceeding with the operation. If the requisite number of active
shard copies are not available, then the write operation must wait and
retry, until either the requisite shard copies have started or a timeout
occurs. By default, write operations only wait for the primary shards
to be active before proceeding (i.e. wait_for_active_shards=1
).
This default can be overridden in the index settings dynamically
by setting index.write.wait_for_active_shards
. To alter this behavior
per operation, the wait_for_active_shards
request parameter can be used.
Valid values are all
or any positive integer up to the total number
of configured copies per shard in the index (which is number_of_replicas+1
).
Specifying a negative value or a number greater than the number of
shard copies will throw an error.
For example, suppose we have a cluster of three nodes, A
, B
, and C
and
we create an index index
with the number of replicas set to 3 (resulting in
4 shard copies, one more copy than there are nodes). If we
attempt an indexing operation, by default the operation will only ensure
the primary copy of each shard is available before proceeding. This means
that even if B
and C
went down, and A
hosted the primary shard copies,
the indexing operation would still proceed with only one copy of the data.
If wait_for_active_shards
is set on the request to 3
(and all 3 nodes
are up), then the indexing operation will require 3 active shard copies
before proceeding, a requirement which should be met because there are 3
active nodes in the cluster, each one holding a copy of the shard. However,
if we set wait_for_active_shards
to all
(or to 4
, which is the same),
the indexing operation will not proceed as we do not have all 4 copies of
each shard active in the index. The operation will timeout
unless a new node is brought up in the cluster to host the fourth copy of
the shard.
It is important to note that this setting greatly reduces the chances of
the write operation not writing to the requisite number of shard copies,
but it does not completely eliminate the possibility, because this check
occurs before the write operation commences. Once the write operation
is underway, it is still possible for replication to fail on any number of
shard copies but still succeed on the primary. The _shards
section of the
write operation’s response reveals the number of shard copies on which
replication succeeded/failed.
{ "_shards": { "total": 2, "failed": 0, "successful": 2 } }
Refresh
editControl when the changes made by this request are visible to search. See refresh.
Noop updates
editWhen updating a document using the index API a new version of the document is
always created even if the document hasn’t changed. If this isn’t acceptable
use the _update
API with detect_noop
set to true. This option isn’t
available on the index API because the index API doesn’t fetch the old source
and isn’t able to compare it against the new source.
There isn’t a hard and fast rule about when noop updates aren’t acceptable. It’s a combination of lots of factors like how frequently your data source sends updates that are actually noops and how many queries per second Elasticsearch runs on the shard receiving the updates.
Timeout
editThe primary shard assigned to perform the index operation might not be
available when the index operation is executed. Some reasons for this
might be that the primary shard is currently recovering from a gateway
or undergoing relocation. By default, the index operation will wait on
the primary shard to become available for up to 1 minute before failing
and responding with an error. The timeout
parameter can be used to
explicitly specify how long it waits. Here is an example of setting it
to 5 minutes:
PUT my-index-000001/_doc/1?timeout=5m { "@timestamp": "2099-11-15T13:12:00", "message": "GET /search HTTP/1.1 200 1070000", "user": { "id": "kimchy" } }
Versioning
editEach indexed document is given a version number. By default,
internal versioning is used that starts at 1 and increments
with each update, deletes included. Optionally, the version number can be
set to an external value (for example, if maintained in a
database). To enable this functionality, version_type
should be set to
external
. The value provided must be a numeric, long value greater than or equal to 0,
and less than around 9.2e+18.
When using the external version type, the system checks to see if the version number passed to the index request is greater than the version of the currently stored document. If true, the document will be indexed and the new version number used. If the value provided is less than or equal to the stored document’s version number, a version conflict will occur and the index operation will fail. For example:
PUT my-index-000001/_doc/1?version=2&version_type=external { "user": { "id": "elkbee" } }
Versioning is completely real time, and is not affected by the near real time aspects of search operations. If no version is provided, then the operation is executed without any version checks.
In the previous example, the operation will succeed since the supplied version of 2 is higher than the current document version of 1. If the document was already updated and its version was set to 2 or higher, the indexing command will fail and result in a conflict (409 http status code).
A nice side effect is that there is no need to maintain strict ordering of async indexing operations executed as a result of changes to a source database, as long as version numbers from the source database are used. Even the simple case of updating the Elasticsearch index using data from a database is simplified if external versioning is used, as only the latest version will be used if the index operations arrive out of order for whatever reason.
Version types
editIn addition to the external
version type, Elasticsearch
also supports other types for specific use cases:
-
external
orexternal_gt
- Only index the document if the given version is strictly higher than the version of the stored document or if there is no existing document. The given version will be used as the new version and will be stored with the new document. The supplied version must be a non-negative long number.
-
external_gte
- Only index the document if the given version is equal or higher than the version of the stored document. If there is no existing document the operation will succeed as well. The given version will be used as the new version and will be stored with the new document. The supplied version must be a non-negative long number.
The external_gte
version type is meant for special use cases and
should be used with care. If used incorrectly, it can result in loss of data.
There is another option, force
, which is deprecated because it can cause
primary and replica shards to diverge.
Examples
editInsert a JSON document into the my-index-000001
index with an _id
of 1:
PUT my-index-000001/_doc/1 { "@timestamp": "2099-11-15T13:12:00", "message": "GET /search HTTP/1.1 200 1070000", "user": { "id": "kimchy" } }
The API returns the following result:
{ "_shards": { "total": 2, "failed": 0, "successful": 2 }, "_index": "my-index-000001", "_type": "_doc", "_id": "1", "_version": 1, "_seq_no": 0, "_primary_term": 1, "result": "created" }
Use the _create
resource to index a document into the my-index-000001
index if
no document with that ID exists:
PUT my-index-000001/_create/1 { "@timestamp": "2099-11-15T13:12:00", "message": "GET /search HTTP/1.1 200 1070000", "user": { "id": "kimchy" } }
Set the op_type
parameter to create to index a document into the my-index-000001
index if no document with that ID exists:
PUT my-index-000001/_doc/1?op_type=create { "@timestamp": "2099-11-15T13:12:00", "message": "GET /search HTTP/1.1 200 1070000", "user": { "id": "kimchy" } }
On this page
ElasticON events are back!
Learn about the Elastic Search AI Platform from the experts at our live events.
Register now