Elasticsearch Guide: other versions:
Getting Started
- Basic Concepts
- Installation
- Exploring Your Cluster
- Modifying Your Data
- Exploring Your Data
- Conclusion
Set up Elasticsearch
- Installing Elasticsearch
- Configuring Elasticsearch
- Important Elasticsearch configuration
- Important System Configuration
- Bootstrap Checks
- Starting Elasticsearch
- Stopping Elasticsearch
- Adding nodes to your cluster
- Installing X-Pack
- Set up X-Pack
- Configuring X-Pack Java Clients
- X-Pack Settings
- Bootstrap Checks for X-Pack
Upgrade Elasticsearch
- Rolling upgrades
- Full cluster restart upgrade
- Reindex before upgrading
  - Reindex in place
  - Reindex from a remote cluster
API Conventions
- Multiple Indices
- Date math support in index names
- Common options
- URL-based access control
Document APIs
- Reading and Writing documents
- Index API
- Get API
- Delete API
- Delete By Query API
- Update API
- Update By Query API
- Multi Get API
- Bulk API
- Reindex API
- Term Vectors
- Multi termvectors API
- ?refresh
Search APIs
- Search
- URI Search
- Request Body Search
  - Query
  - From / Size
  - Sort
  - Source filtering
  - Fields
  - Script Fields
  - Doc value Fields
  - Post filter
  - Highlighting
  - Rescoring
  - Search Type
  - Scroll
  - Preference
  - Explain
  - Version
  - Index Boost
  - min_score
  - Named Queries
  - Inner hits
  - Field Collapsing
  - Search After
- Search Template
- Multi Search Template
- Search Shards API
- Suggesters
- Multi Search API
- Count API
- Validate API
- Explain API
- Profile API
- Field Capabilities API
- Ranking Evaluation API
Aggregations
- Metrics Aggregations
- Bucket Aggregations
- Pipeline Aggregations
- Matrix Aggregations
  - Matrix Stats
- Caching heavy aggregations
- Returning only aggregation results
- Aggregation Metadata
- Returning the type of the aggregation
Indices APIs
- Create Index
- Delete Index
- Get Index
- Indices Exists
- Open / Close Index API
- Shrink Index
- Split Index
- Rollover Index
- Put Mapping
- Get Mapping
- Get Field Mapping
- Types Exists
- Index Aliases
- Update Indices Settings
- Get Settings
- Analyze
  - Explain Analyze
- Index Templates
- Indices Stats
- Indices Segments
- Indices Recovery
- Indices Shard Stores
- Clear Cache
- Flush
  - Synced Flush
- Refresh
- Force Merge
cat APIs
- cat aliases
- cat allocation
- cat count
- cat fielddata
- cat health
- cat indices
- cat master
- cat nodeattrs
- cat nodes
- cat pending tasks
- cat plugins
- cat recovery
- cat repositories
- cat thread pool
- cat shards
- cat segments
- cat snapshots
- cat templates
Cluster APIs
- Cluster Health
- Cluster State
- Cluster Stats
- Pending cluster tasks
- Cluster Reroute
- Cluster Update Settings
- Nodes Stats
- Nodes Info
- Nodes Feature Usage
- Remote Cluster Info
- Task Management API
- Nodes hot_threads
- Cluster Allocation Explain API
Query DSL
- Query and filter context
- Match All Query
- Full text queries
- Term level queries
- Compound queries
- Joining queries
- Geo queries
- Specialized queries
- Span queries
- Minimum Should Match
- Multi Term Query Rewrite
Mapping
- Removal of mapping types
- Field datatypes
- Meta-Fields
- Mapping parameters
- Dynamic Mapping
Analysis
- Anatomy of an analyzer
- Testing analyzers
- Analyzers
- Normalizers
- Tokenizers
- Token Filters
- Character Filters
Modules
- Cluster
- Discovery
- Local Gateway
- HTTP
- Indices
- Network Settings
- Node
- Plugins
- Scripting
- Snapshot And Restore
- Thread Pool
- Transport
- Tribe node
- Cross-cluster search
Index Modules
- Analysis
- Index Shard Allocation
- Mapper
- Merge
- Similarity module
- Slow Log
- Store
  - Pre-loading data into the file system cache
- Translog
- Index Sorting
  - Use index sorting to speed up conjunctions
Ingest Node
- Pipeline Definition
- Ingest APIs
- Accessing Data in Pipelines
- Handling Failures in Pipelines
- Processors
SQL Access
- Overview
- Getting Started with SQL
- Conventions and Terminology
  - Mapping concepts across SQL and Elasticsearch
- Security
- SQL REST API
- SQL Translate API
- SQL CLI
- SQL JDBC
  - API usage
- SQL Language
- Data Types
- SQL Commands
- Functions and Operators
- Reserved keywords
Monitor a cluster
- Overview
- How it works
- Monitoring in a production environment
- Configuring monitoring
- Collectors
- Exporters
  - Local Exporters
  - HTTP exporters
- Troubleshooting
Rolling up historical data
- Overview
- API Quick Reference
- Getting Started
- Understanding Groups
  - Grouping Limitations with heterogeneous indices
  - Doc counts and overlapping jobs
- Rollup Aggregation Limitations
- Rollup Search Limitations
Secure a cluster
- Overview
- Configuring security
- Getting started with security
- How security works
- User authentication
- Configuring SAML single-sign-on on the Elastic Stack
- User authorization
- Auditing security events
- Encrypting communications
  - Setting Up TLS on a Cluster
- Restricting connections with IP filtering
- Cross cluster search, tribe, clients, and integrations
- Reference
  - Security Files
- Troubleshooting
- Limitations
Alerting on Cluster and Index Events
- Getting Started with Watcher
- How Watcher works
- Encrypting sensitive data in Watcher
- Inputs
- Triggers
  - Schedule trigger
- Conditions
- Actions
- Transforms
- Java API
- Managing watches
- Example watches
  - Watching the status of an Elasticsearch cluster
  - Watching event data
- Troubleshooting
- Limitations
X-Pack APIs
- Info API
- Explore API
- Licensing APIs
- Migration APIs
- Machine Learning APIs
- Rollup APIs
- Security APIs
- Watcher APIs
- Definitions
Command line tools
- elasticsearch-certgen
- elasticsearch-certutil
- elasticsearch-migrate
- elasticsearch-saml-metadata
- elasticsearch-setup-passwords
- elasticsearch-syskeygen
- elasticsearch-users
How To
- General recommendations
- Recipes
  - Mixing exact search with stemming
  - Getting consistent scoring
- Tune for indexing speed
- Tune for search speed
  - Tune your queries with the Profile API
- Tune for disk usage
Testing
- Java Testing Framework
Glossary of terms
Release Highlights
- 6.3.0
Breaking changes
- 6.0
- 6.1
- 6.2
- 6.3
Release Notes
- Elasticsearch version 6.3.2
- Elasticsearch version 6.3.1
- Elasticsearch version 6.3.0
- Elasticsearch version 6.2.4
- Elasticsearch version 6.2.3
- Elasticsearch version 6.2.2
- Elasticsearch version 6.2.1
- Elasticsearch version 6.2.0
- Elasticsearch version 6.1.4
- Elasticsearch version 6.1.3
- Elasticsearch version 6.1.2
- Elasticsearch version 6.1.1
- Elasticsearch version 6.1.0
- Elasticsearch version 6.0.1
- Elasticsearch version 6.0.0
- Elasticsearch version 6.0.0-rc2
- Elasticsearch version 6.0.0-rc1
- Elasticsearch version 6.0.0-beta2
- Elasticsearch version 6.0.0-beta1
- Elasticsearch version 6.0.0-alpha2
- Elasticsearch version 6.0.0-alpha1
- Elasticsearch version 6.0.0-alpha1 (Changes previously released in 5.x)

IMPORTANT: No additional bug fixes or documentation updates will be released for this version. For the latest information, see the current release documentation.

« Reading and Writing documents Get API »

› ›

Index API

edit

IMPORTANT: This documentation is no longer updated. Refer to Elastic's version policy and the latest documentation.

Index API

edit

See Removal of mapping types.

The index API adds or updates a typed JSON document in a specific index, making it searchable. The following example inserts the JSON document into the "twitter" index, under a type called _doc with an id of 1:

PUT twitter/_doc/1
{
    "user" : "kimchy",
    "post_date" : "2009-11-15T14:12:12",
    "message" : "trying out Elasticsearch"
}

Copy as curl Try in Elastic

The result of the above index operation is:

{
    "_shards" : {
        "total" : 2,
        "failed" : 0,
        "successful" : 2
    },
    "_index" : "twitter",
    "_type" : "_doc",
    "_id" : "1",
    "_version" : 1,
    "_seq_no" : 0,
    "_primary_term" : 1,
    "result" : "created"
}

The _shards header provides information about the replication process of the index operation.

total - Indicates to how many shard copies (primary and replica shards) the index operation should be executed on.
successful- Indicates the number of shard copies the index operation succeeded on.
failed - An array that contains replication related errors in the case an index operation failed on a replica shard.

The index operation is successful in the case successful is at least 1.

Replica shards may not all be started when an indexing operation successfully returns (by default, only the primary is required, but this behavior can be changed). In that case, total will be equal to the total shards based on the number_of_replicas setting and successful will be equal to the number of shards started (primary plus replicas). If there were no failures, the failed will be 0.

Automatic Index Creation

edit

The index operation automatically creates an index if it does not already exist, and applies any index templates that are configured. The index operation also creates a dynamic type mapping for the specified type if one does not already exist. By default, new fields and objects will automatically be added to the mapping definition for the specified type if needed. Check out the mapping section for more information on mapping definitions, and the the put mapping API for information about updating type mappings manually.

Automatic index creation is controlled by the action.auto_create_index setting. This setting defaults to true, meaning that indices are always automatically created. Automatic index creation can be permitted only for indices matching certain patterns by changing the value of this setting to a comma-separated list of these patterns. It can also be explicitly permitted and forbidden by prefixing patterns in the list with a + or -. Finally it can be completely disabled by changing this setting to false.

PUT _cluster/settings
{
    "persistent": {
        "action.auto_create_index": "twitter,index10,-index1*,+ind*" 
    }
}

PUT _cluster/settings
{
    "persistent": {
        "action.auto_create_index": "false" 
    }
}

PUT _cluster/settings
{
    "persistent": {
        "action.auto_create_index": "true" 
    }
}

Copy as curl Try in Elastic

	Permit only the auto-creation of indices called `twitter`, `index10`, no other index matching `index1`, and any other index matching `ind`. The patterns are matched in the order in which they are given.
	Completely disable the auto-creation of indices.
	Permit the auto-creation of indices with any name. This is the default.

Versioning

edit

Each indexed document is given a version number. The associated version number is returned as part of the response to the index API request. The index API optionally allows for optimistic concurrency control when the version parameter is specified. This will control the version of the document the operation is intended to be executed against. A good example of a use case for versioning is performing a transactional read-then-update. Specifying a version from the document initially read ensures no changes have happened in the meantime (when reading in order to update, it is recommended to set preference to _primary). For example:

PUT twitter/_doc/1?version=2
{
    "message" : "elasticsearch now has versioning support, double cool!"
}

Copy as curl Try in Elastic

NOTE: versioning is completely real time, and is not affected by the near real time aspects of search operations. If no version is provided, then the operation is executed without any version checks.

By default, internal versioning is used that starts at 1 and increments with each update, deletes included. Optionally, the version number can be supplemented with an external value (for example, if maintained in a database). To enable this functionality, version_type should be set to external. The value provided must be a numeric, long value greater or equal to 0, and less than around 9.2e+18. When using the external version type, instead of checking for a matching version number, the system checks to see if the version number passed to the index request is greater than the version of the currently stored document. If true, the document will be indexed and the new version number used. If the value provided is less than or equal to the stored document’s version number, a version conflict will occur and the index operation will fail.

External versioning supports the value 0 as a valid version number. This allows the version to be in sync with an external versioning system where version numbers start from zero instead of one. It has the side effect that documents with version number equal to zero cannot neither be updated using the Update-By-Query API nor be deleted using the Delete By Query API as long as their version number is equal to zero.

A nice side effect is that there is no need to maintain strict ordering of async indexing operations executed as a result of changes to a source database, as long as version numbers from the source database are used. Even the simple case of updating the Elasticsearch index using data from a database is simplified if external versioning is used, as only the latest version will be used if the index operations are out of order for whatever reason.

Version types

edit

Next to the internal & external version types explained above, Elasticsearch also supports other types for specific use cases. Here is an overview of the different version types and their semantics.

internal: only index the document if the given version is identical to the version of the stored document.
external or external_gt: only index the document if the given version is strictly higher than the version of the stored document or if there is no existing document. The given version will be used as the new version and will be stored with the new document. The supplied version must be a non-negative long number.
external_gte: only index the document if the given version is equal or higher than the version of the stored document. If there is no existing document the operation will succeed as well. The given version will be used as the new version and will be stored with the new document. The supplied version must be a non-negative long number.

NOTE: The external_gte version type is meant for special use cases and should be used with care. If used incorrectly, it can result in loss of data. There is another option, force, which is deprecated because it can cause primary and replica shards to diverge.

Operation Type

edit

The index operation also accepts an op_type that can be used to force a create operation, allowing for "put-if-absent" behavior. When create is used, the index operation will fail if a document by that id already exists in the index.

Here is an example of using the op_type parameter:

PUT twitter/_doc/1?op_type=create
{
    "user" : "kimchy",
    "post_date" : "2009-11-15T14:12:12",
    "message" : "trying out Elasticsearch"
}

Copy as curl Try in Elastic

Another option to specify create is to use the following uri:

PUT twitter/_doc/1/_create
{
    "user" : "kimchy",
    "post_date" : "2009-11-15T14:12:12",
    "message" : "trying out Elasticsearch"
}

Copy as curl Try in Elastic

Automatic ID Generation

edit

The index operation can be executed without specifying the id. In such a case, an id will be generated automatically. In addition, the op_type will automatically be set to create. Here is an example (note the POST used instead of PUT):

POST twitter/_doc/
{
    "user" : "kimchy",
    "post_date" : "2009-11-15T14:12:12",
    "message" : "trying out Elasticsearch"
}

Copy as curl Try in Elastic

The result of the above index operation is:

{
    "_shards" : {
        "total" : 2,
        "failed" : 0,
        "successful" : 2
    },
    "_index" : "twitter",
    "_type" : "_doc",
    "_id" : "W0tpsmIBdwcYyG50zbta",
    "_version" : 1,
    "_seq_no" : 0,
    "_primary_term" : 1,
    "result": "created"
}

Routing

edit

By default, shard placement — or routing — is controlled by using a hash of the document’s id value. For more explicit control, the value fed into the hash function used by the router can be directly specified on a per-operation basis using the routing parameter. For example:

POST twitter/_doc?routing=kimchy
{
    "user" : "kimchy",
    "post_date" : "2009-11-15T14:12:12",
    "message" : "trying out Elasticsearch"
}

Copy as curl Try in Elastic

In the example above, the "_doc" document is routed to a shard based on the routing parameter provided: "kimchy".

When setting up explicit mapping, the _routing field can be optionally used to direct the index operation to extract the routing value from the document itself. This does come at the (very minimal) cost of an additional document parsing pass. If the _routing mapping is defined and set to be required, the index operation will fail if no routing value is provided or extracted.

Distributed

edit

The index operation is directed to the primary shard based on its route (see the Routing section above) and performed on the actual node containing this shard. After the primary shard completes the operation, if needed, the update is distributed to applicable replicas.

Wait For Active Shards

edit

To improve the resiliency of writes to the system, indexing operations can be configured to wait for a certain number of active shard copies before proceeding with the operation. If the requisite number of active shard copies are not available, then the write operation must wait and retry, until either the requisite shard copies have started or a timeout occurs. By default, write operations only wait for the primary shards to be active before proceeding (i.e. wait_for_active_shards=1). This default can be overridden in the index settings dynamically by setting index.write.wait_for_active_shards. To alter this behavior per operation, the wait_for_active_shards request parameter can be used.

Valid values are all or any positive integer up to the total number of configured copies per shard in the index (which is number_of_replicas+1). Specifying a negative value or a number greater than the number of shard copies will throw an error.

For example, suppose we have a cluster of three nodes, A, B, and C and we create an index index with the number of replicas set to 3 (resulting in 4 shard copies, one more copy than there are nodes). If we attempt an indexing operation, by default the operation will only ensure the primary copy of each shard is available before proceeding. This means that even if B and C went down, and A hosted the primary shard copies, the indexing operation would still proceed with only one copy of the data. If wait_for_active_shards is set on the request to 3 (and all 3 nodes are up), then the indexing operation will require 3 active shard copies before proceeding, a requirement which should be met because there are 3 active nodes in the cluster, each one holding a copy of the shard. However, if we set wait_for_active_shards to all (or to 4, which is the same), the indexing operation will not proceed as we do not have all 4 copies of each shard active in the index. The operation will timeout unless a new node is brought up in the cluster to host the fourth copy of the shard.

It is important to note that this setting greatly reduces the chances of the write operation not writing to the requisite number of shard copies, but it does not completely eliminate the possibility, because this check occurs before the write operation commences. Once the write operation is underway, it is still possible for replication to fail on any number of shard copies but still succeed on the primary. The _shards section of the write operation’s response reveals the number of shard copies on which replication succeeded/failed.

{
    "_shards" : {
        "total" : 2,
        "failed" : 0,
        "successful" : 2
    }
}

Refresh

edit

Control when the changes made by this request are visible to search. See refresh.

Noop Updates

edit

When updating a document using the index api a new version of the document is always created even if the document hasn’t changed. If this isn’t acceptable use the _update api with detect_noop set to true. This option isn’t available on the index api because the index api doesn’t fetch the old source and isn’t able to compare it against the new source.

There isn’t a hard and fast rule about when noop updates aren’t acceptable. It’s a combination of lots of factors like how frequently your data source sends updates that are actually noops and how many queries per second Elasticsearch runs on the shard with receiving the updates.

Timeout

edit

The primary shard assigned to perform the index operation might not be available when the index operation is executed. Some reasons for this might be that the primary shard is currently recovering from a gateway or undergoing relocation. By default, the index operation will wait on the primary shard to become available for up to 1 minute before failing and responding with an error. The timeout parameter can be used to explicitly specify how long it waits. Here is an example of setting it to 5 minutes:

PUT twitter/_doc/1?timeout=5m
{
    "user" : "kimchy",
    "post_date" : "2009-11-15T14:12:12",
    "message" : "trying out Elasticsearch"
}

Copy as curl Try in Elastic

« Reading and Writing documents Get API »

On this page

Automatic Index Creation
Versioning
Version types
Operation Type
Automatic ID Generation
Routing
Distributed
Wait For Active Shards
Refresh
Noop Updates
Timeout

Was this helpful?

Feedback

The Search AI Company

ELK Stack

Elastic Cloud

Generative AI

Search

Security

Observability

By solution

Industries

Customer spotlight

Research

Build

Learn

Connect

Index API

Index API

Automatic Index Creation

Versioning

Version types

Operation Type

Automatic ID Generation

Routing

Distributed

Wait For Active Shards

Refresh

Noop Updates

Timeout

Follow us

About us

Join us

Partners

Trust & Security

Investor relations

Excellence Awards

About us

Join us

Partners

Trust & Security

Investor relations

Excellence Awards