Elasticsearch Guide: other versions:
What is Elasticsearch?
- Data in: documents and indices
- Information out: search and analyze
- Scalability and resilience
What’s new in 7.9
Getting started with Elasticsearch
- Get Elasticsearch up and running
- Index some documents
- Start searching
- Analyze results with aggregations
- Where to go from here
Set up Elasticsearch
- Installing Elasticsearch
- Configuring Elasticsearch
- Important Elasticsearch configuration
- Important System Configuration
- Bootstrap Checks
- Bootstrap Checks for X-Pack
- Starting Elasticsearch
- Stopping Elasticsearch
- Discovery and cluster formation
- Add and remove nodes in your cluster
- Full-cluster restart and rolling restart
- Remote clusters
- Set up X-Pack
- Configuring X-Pack Java Clients
- Plugins
Upgrade Elasticsearch
- Rolling upgrades
- Full cluster restart upgrade
- Reindex before upgrading
  - Reindex in place
  - Reindex from a remote cluster
Index modules
- Analysis
- Index Shard Allocation
- Index blocks
- Mapper
- Merge
- Similarity module
- Slow Log
- Store
  - Preloading data into the file system cache
- Translog
- History retention
- Index Sorting
  - Use index sorting to speed up conjunctions
- Indexing pressure
Mapping
- Removal of mapping types
- Field data types
  - Alias
  - Arrays
  - Binary
  - Boolean
  - Date
  - Date nanoseconds
  - Dense vector
  - Flattened
  - Geo-point
  - Geo-shape
  - Histogram
  - IP
  - Join
  - Keyword
  - Nested
  - Numeric
  - Object
  - Percolator
  - Point
  - Range
  - Rank feature
  - Rank features
  - Search-as-you-type
  - Shape
  - Sparse vector
  - Text
  - Token count
- Metadata fields
- Mapping parameters
- Dynamic Mapping
  - Dynamic field mapping
  - Dynamic templates
Text analysis
- Overview
- Concepts
- Configure text analysis
- Built-in analyzer reference
  - Fingerprint
  - Keyword
  - Language
  - Pattern
  - Simple
  - Standard
  - Stop
  - Whitespace
- Tokenizer reference
  - Character group
  - Classic
  - Edge n-gram
  - Keyword
  - Letter
  - Lowercase
  - N-gram
  - Path hierarchy
  - Pattern
  - Simple pattern
  - Simple pattern split
  - Standard
  - Thai
  - UAX URL email
  - Whitespace
- Token filter reference
- Character filters reference
- Normalizers
Index templates
- Simulate multi-component templates
Data streams
- Set up a data stream
- Use a data stream
- Change mappings and settings for a data stream
Ingest node
- Pipeline Definition
- Accessing Data in Pipelines
- Conditional Execution in Pipelines
- Handling Failures in Pipelines
- Enrich your data
- Processors
  - Append
  - Bytes
  - Circle
  - Convert
  - CSV
  - Date
  - Date index name
  - Dissect
  - Dot expander
  - Drop
  - Enrich
  - Fail
  - Foreach
  - GeoIP
  - Grok
  - Gsub
  - HTML strip
  - Inference
  - Join
  - JSON
  - KV
  - Lowercase
  - Pipeline
  - Remove
  - Rename
  - Script
  - Set
  - Set security user
  - Sort
  - Split
  - Trim
  - Uppercase
  - URL decode
  - User agent
Search your data
- Collapse search results
- Filter search results
- Highlighting
- Long-running searches
- Near real-time search
- Paginate search results
- Retrieve inner hits
- Retrieve selected fields
- Search across clusters
- Search multiple data streams and indices
- Search shard routing
- Sort search results
Query DSL
- Query and filter context
- Compound queries
- Full text queries
- Geo queries
- Shape queries
  - Shape
- Joining queries
  - Nested
  - Has child
  - Has parent
  - Parent ID
- Match all
- Span queries
- Specialized queries
- Term-level queries
  - Exists
  - Fuzzy
  - IDs
  - Prefix
  - Range
  - Regexp
  - Term
  - Terms
  - Terms set
  - Type Query
  - Wildcard
- minimum_should_match parameter
- rewrite parameter
- Regular expression syntax
Aggregations
- Bucket Aggregations
- Metrics Aggregations
  - Avg
  - Boxplot
  - Cardinality
  - Extended stats
  - Geo-bounds
  - Geo-centroid
  - Matrix stats
  - Max
  - Median absolute deviation
  - Min
  - Percentile ranks
  - Percentiles
  - Scripted metric
  - Stats
  - String stats
  - Sum
  - T-test
  - Top hits
  - Top metrics
  - Value count
  - Weighted avg
- Pipeline Aggregations
EQL
- Syntax reference
- Function reference
- Pipe reference
- Example: Detect threats with EQL
SQL access
- Overview
- Getting Started with SQL
- Conventions and Terminology
  - Mapping concepts across SQL and Elasticsearch
- Security
- SQL REST API
- SQL Translate API
- SQL CLI
- SQL JDBC
  - API usage
- SQL ODBC
  - Driver installation
  - Configuration
- SQL Client Applications
- SQL Language
- Functions and Operators
- Reserved keywords
- SQL Limitations
Scripting
- How to use scripts
  - Scripts and search speed
- Accessing document fields and special variables
- Scripting and security
- Painless scripting language
- Lucene expressions language
- Advanced scripts using script engines
ILM: Manage the index lifecycle
- Overview
- Concepts
- Automate rollover
- Manage Filebeat time-based indices
- Index lifecycle actions
  - Allocate
  - Delete
  - Force merge
  - Freeze
  - Read only
  - Rollover
  - Set priority
  - Shrink
  - Unfollow
  - Wait for snapshot
- Configure a lifecycle policy
- Resolve lifecycle policy execution errors
- Start and stop index lifecycle management
- Manage existing indices
- Skip rollover
- Restore a managed data stream or index
Monitor a cluster
- Overview
- How it works
- Monitoring in a production environment
- Collecting monitoring data with Metricbeat
- Collecting log data with Filebeat
- Configuring indices for monitoring
- Legacy collection methods
- Troubleshooting
Frozen indices
- Best practices
- Searching a frozen index
- Monitoring frozen indices
Roll up or transform your data
- Rolling up historical data
- Transforming data
Set up a cluster for high availability
- Designing for resilience
  - Resilience in small clusters
  - Resilience in larger clusters
- Back up a cluster
- Cross-cluster replication
Snapshot and restore
- Register repository
- Create a snapshot
- Restore a snapshot
- Monitor snapshot and restore
- Delete a snapshot
- SLM: Manage the snapshot lifecycle
Secure a cluster
- Overview
- Configuring security
- User authentication
- Configuring SAML single-sign-on on the Elastic Stack
- Configuring single sign-on to the Elastic Stack using OpenID Connect
- User authorization
- Enabling audit logging
- Encrypting communications
- Restricting connections with IP filtering
- Cross cluster search, clients, and integrations
- Tutorial: Getting started with security
- Tutorial: Encrypting communications
- Troubleshooting
- Limitations
Alerting on cluster and index events
- Getting started with Watcher
- How Watcher works
- Encrypting sensitive data in Watcher
- Inputs
- Triggers
  - Schedule trigger
- Conditions
- Actions
- Payload transforms
- Java API
- Managing watches
- Example watches
  - Watching the status of an Elasticsearch cluster
  - Watching event data
- Troubleshooting
- Limitations
Command line tools
- elasticsearch-certgen
- elasticsearch-certutil
- elasticsearch-croneval
- elasticsearch-keystore
- elasticsearch-migrate
- elasticsearch-node
- elasticsearch-saml-metadata
- elasticsearch-setup-passwords
- elasticsearch-shard
- elasticsearch-syskeygen
- elasticsearch-users
How To
- General recommendations
- Recipes
- Tune for indexing speed
- Tune for search speed
- Tune for disk usage
- Size your shards
Glossary of terms
REST APIs
- API conventions
  - Multi-target syntax
  - Date math support in index names
  - Cron expressions
  - Common options
  - URL-based access control
- cat APIs
  - cat aliases
  - cat allocation
  - cat anomaly detectors
  - cat count
  - cat data frame analytics
  - cat datafeeds
  - cat fielddata
  - cat health
  - cat indices
  - cat master
  - cat nodeattrs
  - cat nodes
  - cat pending tasks
  - cat plugins
  - cat recovery
  - cat repositories
  - cat shards
  - cat segments
  - cat snapshots
  - cat task management
  - cat templates
  - cat thread pool
  - cat trained model
  - cat transforms
- Cluster APIs
  - Cluster allocation explain
  - Cluster get settings
  - Cluster health
  - Cluster reroute
  - Cluster state
  - Cluster stats
  - Cluster update settings
  - Nodes feature usage
  - Nodes hot threads
  - Nodes info
  - Nodes reload secure settings
  - Nodes stats
  - Pending cluster tasks
  - Remote cluster info
  - Task management
  - Voting configuration exclusions
- Cross-cluster replication APIs
  - Get CCR stats
  - Create follower
  - Pause follower
  - Resume follower
  - Unfollow
  - Forget follower
  - Get follower stats
  - Get follower info
  - Create auto-follow pattern
  - Delete auto-follow pattern
  - Get auto-follow pattern
  - Pause auto-follow pattern
  - Resume auto-follow pattern
- Data stream APIs
  - Create data stream
  - Delete data stream
  - Get data stream
  - Data stream stats
- Document APIs
  - Reading and Writing documents
  - Index
  - Get
  - Delete
  - Delete by query
  - Update
  - Update by query
  - Multi get
  - Bulk
  - Reindex
  - Term vectors
  - Multi term vectors
  - ?refresh
  - Optimistic concurrency control
- Enrich APIs
  - Put enrich policy
  - Delete enrich policy
  - Get enrich policy
  - Execute enrich policy
  - Enrich stats
- Explore API
- Index APIs
  - Add index alias
  - Analyze
  - Clear cache
  - Clone index
  - Close index
  - Create index
  - Delete index
  - Delete index alias
  - Delete component template
  - Delete index template
  - Delete index template (legacy)
  - Flush
  - Force merge
  - Freeze index
  - Get component template
  - Get field mapping
  - Get index
  - Get index alias
  - Get index settings
  - Get index template
  - Get index template (legacy)
  - Get mapping
  - Index alias exists
  - Index exists
  - Index recovery
  - Index segments
  - Index shard stores
  - Index stats
  - Index template exists
  - Open index
  - Put index template
  - Put index template (legacy)
  - Put component template
  - Put mapping
  - Refresh
  - Rollover index
  - Shrink index
  - Simulate index
  - Simulate template
  - Split index
  - Synced flush
  - Type exists
  - Unfreeze index
  - Update index alias
  - Update index settings
  - Resolve index
  - List dangling indices
  - Import dangling index
  - Delete dangling index
- Index lifecycle management API
  - Create policy
  - Get policy
  - Delete policy
  - Move to step
  - Remove policy
  - Retry policy
  - Get index lifecycle management status
  - Explain lifecycle
  - Start index lifecycle management
  - Stop index lifecycle management
- Ingest APIs
  - Put pipeline
  - Get pipeline
  - Delete pipeline
  - Simulate pipeline
- Info API
- Licensing APIs
  - Delete license
  - Get license
  - Get trial status
  - Start trial
  - Get basic status
  - Start basic
  - Update license
- Machine learning anomaly detection APIs
  - Add events to calendar
  - Add jobs to calendar
  - Close jobs
  - Create jobs
  - Create calendar
  - Create datafeeds
  - Create filter
  - Delete calendar
  - Delete datafeeds
  - Delete events from calendar
  - Delete filter
  - Delete forecast
  - Delete jobs
  - Delete jobs from calendar
  - Delete model snapshots
  - Delete expired data
  - Estimate model memory
  - Find file structure
  - Flush jobs
  - Forecast jobs
  - Get buckets
  - Get calendars
  - Get categories
  - Get datafeeds
  - Get datafeed statistics
  - Get influencers
  - Get jobs
  - Get job statistics
  - Get machine learning info
  - Get model snapshots
  - Get overall buckets
  - Get scheduled events
  - Get filters
  - Get records
  - Open jobs
  - Post data to jobs
  - Preview datafeeds
  - Revert model snapshots
  - Set upgrade mode
  - Start datafeeds
  - Stop datafeeds
  - Update datafeeds
  - Update filter
  - Update jobs
  - Update model snapshots
- Machine learning data frame analytics APIs
  - Create data frame analytics jobs
  - Create trained model
  - Update data frame analytics jobs
  - Delete data frame analytics jobs
  - Delete trained model
  - Evaluate data frame analytics
  - Explain data frame analytics API
  - Get data frame analytics jobs
  - Get data frame analytics jobs stats
  - Get trained model
  - Get trained model stats
  - Start data frame analytics jobs
  - Stop data frame analytics jobs
- Migration APIs
  - Deprecation info
- Reload search analyzers
- Rollup APIs
  - Create rollup jobs
  - Delete rollup jobs
  - Get job
  - Get rollup caps
  - Get rollup index caps
  - Rollup search
  - Start rollup jobs
  - Stop rollup jobs
- Search APIs
  - Search
  - Async search
  - Scroll
  - Clear scroll
  - Search Template
  - Multi Search Template
  - Search Shards API
  - Suggesters
  - Multi search
  - EQL search
  - Get async EQL search
  - Delete async EQL search
  - Count API
  - Validate API
  - Explain API
  - Profile API
  - Field capabilities
  - Ranking evaluation
- Security APIs
  - Authenticate
  - Change passwords
  - Clear cache
  - Clear roles cache
  - Clear privileges cache
  - Create API keys
  - Create or update application privileges
  - Create or update role mappings
  - Create or update roles
  - Create or update users
  - Delegate PKI authentication
  - Delete application privileges
  - Delete role mappings
  - Delete roles
  - Delete users
  - Disable users
  - Enable users
  - Get API key information
  - Get application privileges
  - Get builtin privileges
  - Get role mappings
  - Get roles
  - Get token
  - Get users
  - Grant API keys
  - Has privileges
  - Invalidate API key
  - Invalidate token
  - OpenID Connect Prepare Authentication API
  - OpenID Connect authenticate API
  - OpenID Connect logout API
  - SAML prepare authentication API
  - SAML authenticate API
  - SAML logout API
  - SAML invalidate API
  - SSL certificate
- Snapshot and restore APIs
  - Put snapshot repository
  - Verify snapshot repository
  - Get snapshot repository
  - Delete snapshot repository
  - Clean up snapshot repository
  - Create snapshot
  - Get snapshot
  - Get snapshot status
  - Restore snapshot
  - Delete snapshot
- Snapshot lifecycle management API
  - Put policy
  - Get policy
  - Delete policy
  - Execute snapshot lifecycle policy
  - Execute snapshot retention policy
  - Get snapshot lifecycle management status
  - Get snapshot lifecycle stats
  - Start snapshot lifecycle management
  - Stop snapshot lifecycle management
- Transform APIs
  - Create transform
  - Delete transform
  - Get transforms
  - Get transform statistics
  - Preview transform
  - Start transform
  - Stop transforms
  - Update transform
- Usage API
- Watcher APIs
  - Ack watch
  - Activate watch
  - Deactivate watch
  - Delete watch
  - Execute watch
  - Get watch
  - Get Watcher stats
  - Put watch
  - Start watch service
  - Stop watch service
- Definitions
  - Role mapping resources
Breaking changes
- 7.9
- 7.8
- 7.7
- 7.6
- 7.5
- 7.4
- 7.3
- 7.2
- 7.1
- 7.0
  - Java time migration guide
Release notes
- Elasticsearch version 7.9.3
- Elasticsearch version 7.9.2
- Elasticsearch version 7.9.1
- Elasticsearch version 7.9.0
- Elasticsearch version 7.8.1
- Elasticsearch version 7.8.0
- Elasticsearch version 7.7.1
- Elasticsearch version 7.7.0
- Elasticsearch version 7.6.2
- Elasticsearch version 7.6.1
- Elasticsearch version 7.6.0
- Elasticsearch version 7.5.2
- Elasticsearch version 7.5.1
- Elasticsearch version 7.5.0
- Elasticsearch version 7.4.2
- Elasticsearch version 7.4.1
- Elasticsearch version 7.4.0
- Elasticsearch version 7.3.2
- Elasticsearch version 7.3.1
- Elasticsearch version 7.3.0
- Elasticsearch version 7.2.1
- Elasticsearch version 7.2.0
- Elasticsearch version 7.1.1
- Elasticsearch version 7.1.0
- Elasticsearch version 7.0.0
- Elasticsearch version 7.0.0-rc2
- Elasticsearch version 7.0.0-rc1
- Elasticsearch version 7.0.0-beta1
- Elasticsearch version 7.0.0-alpha2
- Elasticsearch version 7.0.0-alpha1

IMPORTANT: No additional bug fixes or documentation updates will be released for this version. For the latest information, see the current release documentation.

« Reading and Writing documents Get API »

› › ›

Index API

edit

IMPORTANT: This documentation is no longer updated. Refer to Elastic's version policy and the latest documentation.

Index API

edit

See Removal of mapping types.

Adds a JSON document to the specified data stream or index and makes it searchable. If the target is an index and the document already exists, the request updates the document and increments its version.

You cannot use the index API to send update requests for existing documents to a data stream. See Update documents in a data stream by query and Update or delete documents in a backing index.

Request

edit

PUT /<target>/_doc/<_id>

POST /<target>/_doc/

PUT /<target>/_create/<_id>

POST /<target>/_create/<_id>

You cannot add new documents to a data stream using the PUT /<target>/_doc/<_id> request format. To specify a document ID, use the PUT /<target>/_create/<_id> format instead. See Add documents to a data stream.

Path parameters

edit

<target>

(Required, string) Name of the data stream or index to target.

If the target doesn’t exist and matches the name or wildcard (*) pattern of an index template with a data_stream definition, this request creates the data stream. See Set up a data stream.

If the target doesn’t exist and doesn’t match a data stream template, this request creates the index.

You can check for existing targets using the resolve index API.

<_id>

(Optional, string) Unique identifier for the document.

This parameter is required for the following request formats:

PUT /<target>/_doc/<_id>
PUT /<target>/_create/<_id>
POST /<target>/_create/<_id>

To automatically generate a document ID, use the POST /<target>/_doc/ request format and omit this parameter.

Query parameters

edit

if_seq_no: (Optional, integer) Only perform the operation if the document has this sequence number. See Optimistic concurrency control.
if_primary_term: (Optional, integer) Only perform the operation if the document has this primary term. See Optimistic concurrency control.

op_type

(Optional, enum) Set to create to only index the document if it does not already exist (put if absent). If a document with the specified _id already exists, the indexing operation will fail. Same as using the <index>/_create endpoint. Valid values: index, create. If document id is specified, it defaults to index. Otherwise, it defaults to create.

If the request targets a data stream, an op_type of create is required. See Add documents to a data stream.

pipeline

(Optional, string) ID of the pipeline to use to preprocess incoming documents.

refresh

(Optional, enum) If true, Elasticsearch refreshes the affected shards to make this operation visible to search, if wait_for then wait for a refresh to make this operation visible to search, if false do nothing with refreshes. Valid values: true, false, wait_for. Default: false.

routing

(Optional, string) Target the specified primary shard.

master_timeout

(Optional, time units) Specifies the period of time to wait for a connection to the master node. If no response is received before the timeout expires, the request fails and returns an error. Defaults to 30s.

timeout

(Optional, time units) Specifies the period of time to wait for a response. If no response is received before the timeout expires, the request fails and returns an error. Defaults to 30s.

version

(Optional, integer) Explicit version number for concurrency control. The specified version must match the current version of the document for the request to succeed.

version_type

(Optional, enum) Specific version type: internal, external, external_gte.

wait_for_active_shards

(Optional, string) The number of shard copies that must be active before proceeding with the operation. Set to all or any positive integer up to the total number of shards in the index (number_of_replicas+1). Default: 1, the primary shard.

See Active shards.

Request body

edit

<field>: (Required, string) Request body contains the JSON source for the document data.

Response body

edit

_shards: Provides information about the replication process of the index operation.
_shards.total: Indicates how many shard copies (primary and replica shards) the index operation should be executed on.
_shards.successful: Indicates the number of shard copies the index operation succeeded on. When the index operation is successful, successful is at least 1.

Replica shards might not all be started when an indexing operation returns successfully—by default, only the primary is required. Set wait_for_active_shards to change this default behavior. See Active shards.
_shards.failed: An array that contains replication-related errors in the case an index operation failed on a replica shard. 0 indicates there were no failures.
_index: The name of the index the document was added to.
_type: The document type. Elasticsearch indices now support a single document type, _doc.
_id: The unique identifier for the added document.
_version: The document version. Incremented each time the document is updated.
_seq_no: The sequence number assigned to the document for the indexing operation. Sequence numbers are used to ensure an older version of a document doesn’t overwrite a newer version. See Optimistic concurrency control.
_primary_term: The primary term assigned to the document for the indexing operation. See Optimistic concurrency control.
result: The result of the indexing operation, created or updated.

Description

edit

You can index a new JSON document with the _doc or _create resource. Using _create guarantees that the document is only indexed if it does not already exist. To update an existing document, you must use the _doc resource.

Automatically create data streams and indices

edit

If request’s target doesn’t exist and matches an index template with a data_stream definition, the index operation automatically creates the data stream. See Set up a data stream.

If the target doesn’t exist and doesn’t match a data stream template, the operation automatically creates the index and applies any matching index templates.

Elasticsearch has built-in index templates for the metrics-*-* and logs-*-* index patterns, each with a priority of 100. Elastic Agent uses these templates to create data streams. If you use Elastic Agent, assign your index templates a priority lower than 100 to avoid overriding the built-in templates.

Otherwise, to avoid accidentally applying the built-in templates, use a non-overlapping index pattern or assign templates with an overlapping pattern a priority higher than 100.

For example, if you don’t use Elastic Agent and want to create a template for the logs-* index pattern, assign your template a priority of 200. This ensures your template is applied instead of the built-in template for logs-*-*.

If no mapping exists, the index operation creates a dynamic mapping. By default, new fields and objects are automatically added to the mapping if needed. For more information about field mapping, see mapping and the put mapping API.

Automatic index creation is controlled by the action.auto_create_index setting. This setting defaults to true, which allows any index to be created automatically. You can modify this setting to explicitly allow or block automatic creation of indices that match specified patterns, or set it to false to disable automatic index creation entirely. Specify a comma-separated list of patterns you want to allow, or prefix each pattern with + or - to indicate whether it should be allowed or blocked. When a list is specified, the default behaviour is to disallow.

The action.auto_create_index setting only affects the automatic creation of indices. It does not affect the creation of data streams.

PUT _cluster/settings
{
  "persistent": {
    "action.auto_create_index": "my-index-000001,index10,-index1*,+ind*" 
  }
}

PUT _cluster/settings
{
  "persistent": {
    "action.auto_create_index": "false" 
  }
}

PUT _cluster/settings
{
  "persistent": {
    "action.auto_create_index": "true" 
  }
}

Copy as curl Try in Elastic

	Allow auto-creation of indices called `my-index-000001` or `index10`, block the creation of indices that match the pattern `index1`, and allow creation of any other indices that match the `ind` pattern. Patterns are matched in the order specified.
	Disable automatic index creation entirely.
	Allow automatic creation of any index. This is the default.

Put if absent

edit

You can force a create operation by using the _create resource or setting the op_type parameter to create. In this case, the index operation fails if a document with the specified ID already exists in the index.

Create document IDs automatically

edit

When using the POST /<target>/_doc/ request format, the op_type is automatically set to create and the index operation generates a unique ID for the document.

POST my-index-000001/_doc/
{
  "@timestamp": "2099-11-15T13:12:00",
  "message": "GET /search HTTP/1.1 200 1070000",
  "user": {
    "id": "kimchy"
  }
}

Copy as curl Try in Elastic

The API returns the following result:

{
  "_shards": {
    "total": 2,
    "failed": 0,
    "successful": 2
  },
  "_index": "my-index-000001",
   "_type": "_doc",
  "_id": "W0tpsmIBdwcYyG50zbta",
  "_version": 1,
  "_seq_no": 0,
  "_primary_term": 1,
  "result": "created"
}

Optimistic concurrency control

edit

Index operations can be made conditional and only be performed if the last modification to the document was assigned the sequence number and primary term specified by the if_seq_no and if_primary_term parameters. If a mismatch is detected, the operation will result in a VersionConflictException and a status code of 409. See Optimistic concurrency control for more details.

Routing

edit

By default, shard placement — or routing — is controlled by using a hash of the document’s id value. For more explicit control, the value fed into the hash function used by the router can be directly specified on a per-operation basis using the routing parameter. For example:

POST my-index-000001/_doc?routing=kimchy
{
  "@timestamp": "2099-11-15T13:12:00",
  "message": "GET /search HTTP/1.1 200 1070000",
  "user": {
    "id": "kimchy"
  }
}

Copy as curl Try in Elastic

In this example, the document is routed to a shard based on the routing parameter provided: "kimchy".

When setting up explicit mapping, you can also use the _routing field to direct the index operation to extract the routing value from the document itself. This does come at the (very minimal) cost of an additional document parsing pass. If the _routing mapping is defined and set to be required, the index operation will fail if no routing value is provided or extracted.

Data streams do not support custom routing. Instead, target the appropriate backing index for the stream.

Distributed

edit

The index operation is directed to the primary shard based on its route (see the Routing section above) and performed on the actual node containing this shard. After the primary shard completes the operation, if needed, the update is distributed to applicable replicas.

Active shards

edit

To improve the resiliency of writes to the system, indexing operations can be configured to wait for a certain number of active shard copies before proceeding with the operation. If the requisite number of active shard copies are not available, then the write operation must wait and retry, until either the requisite shard copies have started or a timeout occurs. By default, write operations only wait for the primary shards to be active before proceeding (i.e. wait_for_active_shards=1). This default can be overridden in the index settings dynamically by setting index.write.wait_for_active_shards. To alter this behavior per operation, the wait_for_active_shards request parameter can be used.

Valid values are all or any positive integer up to the total number of configured copies per shard in the index (which is number_of_replicas+1). Specifying a negative value or a number greater than the number of shard copies will throw an error.

For example, suppose we have a cluster of three nodes, A, B, and C and we create an index index with the number of replicas set to 3 (resulting in 4 shard copies, one more copy than there are nodes). If we attempt an indexing operation, by default the operation will only ensure the primary copy of each shard is available before proceeding. This means that even if B and C went down, and A hosted the primary shard copies, the indexing operation would still proceed with only one copy of the data. If wait_for_active_shards is set on the request to 3 (and all 3 nodes are up), then the indexing operation will require 3 active shard copies before proceeding, a requirement which should be met because there are 3 active nodes in the cluster, each one holding a copy of the shard. However, if we set wait_for_active_shards to all (or to 4, which is the same), the indexing operation will not proceed as we do not have all 4 copies of each shard active in the index. The operation will timeout unless a new node is brought up in the cluster to host the fourth copy of the shard.

It is important to note that this setting greatly reduces the chances of the write operation not writing to the requisite number of shard copies, but it does not completely eliminate the possibility, because this check occurs before the write operation commences. Once the write operation is underway, it is still possible for replication to fail on any number of shard copies but still succeed on the primary. The _shards section of the write operation’s response reveals the number of shard copies on which replication succeeded/failed.

{
  "_shards": {
    "total": 2,
    "failed": 0,
    "successful": 2
  }
}

Refresh

edit

Control when the changes made by this request are visible to search. See refresh.

Noop updates

edit

When updating a document using the index API a new version of the document is always created even if the document hasn’t changed. If this isn’t acceptable use the _update API with detect_noop set to true. This option isn’t available on the index API because the index API doesn’t fetch the old source and isn’t able to compare it against the new source.

There isn’t a hard and fast rule about when noop updates aren’t acceptable. It’s a combination of lots of factors like how frequently your data source sends updates that are actually noops and how many queries per second Elasticsearch runs on the shard receiving the updates.

Timeout

edit

The primary shard assigned to perform the index operation might not be available when the index operation is executed. Some reasons for this might be that the primary shard is currently recovering from a gateway or undergoing relocation. By default, the index operation will wait on the primary shard to become available for up to 1 minute before failing and responding with an error. The timeout parameter can be used to explicitly specify how long it waits. Here is an example of setting it to 5 minutes:

PUT my-index-000001/_doc/1?timeout=5m
{
  "@timestamp": "2099-11-15T13:12:00",
  "message": "GET /search HTTP/1.1 200 1070000",
  "user": {
    "id": "kimchy"
  }
}

Copy as curl Try in Elastic

Versioning

edit

Each indexed document is given a version number. By default, internal versioning is used that starts at 1 and increments with each update, deletes included. Optionally, the version number can be set to an external value (for example, if maintained in a database). To enable this functionality, version_type should be set to external. The value provided must be a numeric, long value greater than or equal to 0, and less than around 9.2e+18.

When using the external version type, the system checks to see if the version number passed to the index request is greater than the version of the currently stored document. If true, the document will be indexed and the new version number used. If the value provided is less than or equal to the stored document’s version number, a version conflict will occur and the index operation will fail. For example:

PUT my-index-000001/_doc/1?version=2&version_type=external
{
  "user": {
    "id": "elkbee"
  }
}

Copy as curl Try in Elastic

Versioning is completely real time, and is not affected by the near real time aspects of search operations. If no version is provided, then the operation is executed without any version checks.

In the previous example, the operation will succeed since the supplied version of 2 is higher than the current document version of 1. If the document was already updated and its version was set to 2 or higher, the indexing command will fail and result in a conflict (409 http status code).

A nice side effect is that there is no need to maintain strict ordering of async indexing operations executed as a result of changes to a source database, as long as version numbers from the source database are used. Even the simple case of updating the Elasticsearch index using data from a database is simplified if external versioning is used, as only the latest version will be used if the index operations arrive out of order for whatever reason.

Version types

edit

In addition to the external version type, Elasticsearch also supports other types for specific use cases:

internal: Only index the document if the given version is identical to the version of the stored document.
external or external_gt: Only index the document if the given version is strictly higher than the version of the stored document or if there is no existing document. The given version will be used as the new version and will be stored with the new document. The supplied version must be a non-negative long number.
external_gte: Only index the document if the given version is equal or higher than the version of the stored document. If there is no existing document the operation will succeed as well. The given version will be used as the new version and will be stored with the new document. The supplied version must be a non-negative long number.

The external_gte version type is meant for special use cases and should be used with care. If used incorrectly, it can result in loss of data. There is another option, force, which is deprecated because it can cause primary and replica shards to diverge.

Examples

edit

Insert a JSON document into the my-index-000001 index with an _id of 1:

PUT my-index-000001/_doc/1
{
  "@timestamp": "2099-11-15T13:12:00",
  "message": "GET /search HTTP/1.1 200 1070000",
  "user": {
    "id": "kimchy"
  }
}

Copy as curl Try in Elastic

The API returns the following result:

{
  "_shards": {
    "total": 2,
    "failed": 0,
    "successful": 2
  },
  "_index": "my-index-000001",
   "_type": "_doc",
  "_id": "1",
  "_version": 1,
  "_seq_no": 0,
  "_primary_term": 1,
  "result": "created"
}

Use the _create resource to index a document into the my-index-000001 index if no document with that ID exists:

PUT my-index-000001/_create/1
{
  "@timestamp": "2099-11-15T13:12:00",
  "message": "GET /search HTTP/1.1 200 1070000",
  "user": {
    "id": "kimchy"
  }
}

Copy as curl Try in Elastic

Set the op_type parameter to create to index a document into the my-index-000001 index if no document with that ID exists:

PUT my-index-000001/_doc/1?op_type=create
{
  "@timestamp": "2099-11-15T13:12:00",
  "message": "GET /search HTTP/1.1 200 1070000",
  "user": {
    "id": "kimchy"
  }
}

Copy as curl Try in Elastic

« Reading and Writing documents Get API »

On this page

Request
Path parameters
Query parameters
Request body
Response body
Description
Automatically create data streams and indices
Put if absent
Create document IDs automatically
Optimistic concurrency control
Routing
Distributed
Active shards
Refresh
Noop updates
Timeout
Versioning
Version types
Examples

Was this helpful?

Feedback

The Search AI Company

Generative AI

Search

Security

Observability

By solution

Industries

Index API

Index API

Request

Path parameters

Query parameters

Request body

Response body

Description

Automatically create data streams and indices

Put if absent

Create document IDs automatically

Optimistic concurrency control

Routing

Distributed

Active shards

Refresh

Noop updates

Timeout

Versioning

Version types

Examples

Follow us

About us

Join us

Partners

Trust & Security

Investor relations

Excellence Awards