Elasticsearch Guide: other versions:
What is Elasticsearch?
- Data in: documents and indices
- Information out: search and analyze
- Scalability and resilience
What’s new in 8.5
Set up Elasticsearch
- Installing Elasticsearch
- Run Elasticsearch locally
- Configuring Elasticsearch
- Important system configuration
- Bootstrap Checks
- Bootstrap Checks for X-Pack
- Starting Elasticsearch
- Stopping Elasticsearch
- Discovery and cluster formation
- Add and remove nodes in your cluster
- Full-cluster restart and rolling restart
- Remote clusters
- Plugins
Upgrade Elasticsearch
- Archived settings
- Reading indices from older Elasticsearch versions
Index modules
- Analysis
- Index Shard Allocation
- Index blocks
- Mapper
- Merge
- Similarity module
- Slow Log
- Store
  - Preloading data into the file system cache
- Translog
- History retention
- Index Sorting
  - Use index sorting to speed up conjunctions
- Indexing pressure
Mapping
- Dynamic mapping
  - Dynamic field mapping
  - Dynamic templates
- Explicit mapping
- Runtime fields
- Field data types
  - Aggregate metric
  - Alias
  - Arrays
  - Binary
  - Boolean
  - Completion
  - Date
  - Date nanoseconds
  - Dense vector
  - Flattened
  - Geopoint
  - Geoshape
  - Histogram
  - IP
  - Join
  - Keyword
  - Nested
  - Numeric
  - Object
  - Percolator
  - Point
  - Range
  - Rank feature
  - Rank features
  - Search-as-you-type
  - Shape
  - Text
  - Token count
  - Unsigned long
  - Version
- Metadata fields
- Mapping parameters
- Mapping limit settings
- Removal of mapping types
Text analysis
- Overview
- Concepts
- Configure text analysis
- Built-in analyzer reference
  - Fingerprint
  - Keyword
  - Language
  - Pattern
  - Simple
  - Standard
  - Stop
  - Whitespace
- Tokenizer reference
  - Character group
  - Classic
  - Edge n-gram
  - Keyword
  - Letter
  - Lowercase
  - N-gram
  - Path hierarchy
  - Pattern
  - Simple pattern
  - Simple pattern split
  - Standard
  - Thai
  - UAX URL email
  - Whitespace
- Token filter reference
- Character filters reference
- Normalizers
Index templates
- Simulate multi-component templates
Data streams
- Set up a data stream
- Use a data stream
- Change mappings and settings for a data stream
- Time series data stream (TSDS)
Ingest pipelines
- Example: Parse logs
- Enrich your data
- Processor reference
  - Append
  - Attachment
  - Bytes
  - Circle
  - Community ID
  - Convert
  - CSV
  - Date
  - Date index name
  - Dissect
  - Dot expander
  - Drop
  - Enrich
  - Fail
  - Fingerprint
  - Foreach
  - GeoIP
  - Grok
  - Gsub
  - HTML strip
  - Inference
  - Join
  - JSON
  - KV
  - Lowercase
  - Network direction
  - Pipeline
  - Registered domain
  - Remove
  - Rename
  - Script
  - Set
  - Set security user
  - Sort
  - Split
  - Trim
  - Uppercase
  - URL decode
  - URI parts
  - User agent
Aliases
Search your data
- Collapse search results
- Filter search results
- Highlighting
- Long-running searches
- Near real-time search
- Paginate search results
- Retrieve inner hits
- Retrieve selected fields
- Search across clusters
- Search multiple data streams and indices
- Search shard routing
- Search templates
- Sort search results
- kNN search
Query DSL
- Query and filter context
- Compound queries
- Full text queries
- Geo queries
- Shape queries
  - Shape
- Joining queries
  - Nested
  - Has child
  - Has parent
  - Parent ID
- Match all
- Span queries
- Specialized queries
- Term-level queries
  - Exists
  - Fuzzy
  - IDs
  - Prefix
  - Range
  - Regexp
  - Term
  - Terms
  - Terms set
  - Wildcard
- minimum_should_match parameter
- rewrite parameter
- Regular expression syntax
Aggregations
- Bucket aggregations
- Metrics aggregations
  - Avg
  - Boxplot
  - Cardinality
  - Extended stats
  - Geo-bounds
  - Geo-centroid
  - Geo-Line
  - Matrix stats
  - Max
  - Median absolute deviation
  - Min
  - Percentile ranks
  - Percentiles
  - Rate
  - Scripted metric
  - Stats
  - String stats
  - Sum
  - T-test
  - Top hits
  - Top metrics
  - Value count
  - Weighted avg
- Pipeline aggregations
EQL
- Syntax reference
- Function reference
- Pipe reference
- Example: Detect threats with EQL
SQL
- Overview
- Getting Started with SQL
- Conventions and Terminology
  - Mapping concepts across SQL and Elasticsearch
- Security
- SQL REST API
- SQL Translate API
- SQL CLI
- SQL JDBC
  - API usage
- SQL ODBC
  - Driver installation
  - Configuration
- SQL Client Applications
- SQL Language
- Functions and Operators
- Reserved keywords
- SQL Limitations
Scripting
- Painless scripting language
- How to write scripts
- Access fields in a document
- Common scripting use cases
  - Field extraction
- Accessing document fields and special variables
- Scripting and security
- Lucene expressions language
- Advanced scripts using script engines
Data management
- ILM: Manage the index lifecycle
- Tutorial: Customize built-in policies
- Tutorial: Automate rollover
- Index management in Kibana
- Overview
- Concepts
- Index lifecycle actions
  - Allocate
  - Delete
  - Force merge
  - Migrate
  - Read only
  - Rollover
  - Downsample
  - Searchable snapshot
  - Set priority
  - Shrink
  - Unfollow
  - Wait for snapshot
- Configure a lifecycle policy
- Migrate index allocation filters to node roles
- Troubleshooting index lifecycle management errors
- Start and stop index lifecycle management
- Manage existing indices
- Skip rollover
- Restore a managed data stream or index
- Data tiers
Autoscaling
- Autoscaling deciders
Monitor a cluster
- Overview
- How it works
- Monitoring in a production environment
- Collecting monitoring data with Metricbeat
- Collecting log data with Filebeat
- Configuring indices for monitoring
- Legacy collection methods
Roll up or transform your data
- Rolling up historical data
- Transforming data
Set up a cluster for high availability
- Designing for resilience
  - Resilience in small clusters
  - Resilience in larger clusters
- Cross-cluster replication
Snapshot and restore
- Register a repository
- Create a snapshot
- Restore a snapshot
- Searchable snapshots
Secure the Elastic Stack
- Elasticsearch security principles
- Start the Elastic Stack with security enabled automatically
- Manually configure security
- Updating node security certificates
  - With the same CA
  - With a different CA
- User authentication
- User authorization
- Enable audit logging
- Restricting connections with IP filtering
- Securing clients and integrations
- Operator privileges
- Troubleshooting
- Limitations
Watcher
- Getting started with Watcher
- How Watcher works
- Encrypting sensitive data in Watcher
- Inputs
- Triggers
  - Schedule trigger
- Conditions
- Actions
- Transforms
- Managing watches
- Example watches
  - Watching the status of an Elasticsearch cluster
  - Watching event data
- Limitations
Command line tools
- elasticsearch-certgen
- elasticsearch-certutil
- elasticsearch-create-enrollment-token
- elasticsearch-croneval
- elasticsearch-keystore
- elasticsearch-node
- elasticsearch-reconfigure-node
- elasticsearch-reset-password
- elasticsearch-saml-metadata
- elasticsearch-service-tokens
- elasticsearch-setup-passwords
- elasticsearch-shard
- elasticsearch-syskeygen
- elasticsearch-users
How to
- General recommendations
- Recipes
- Tune for indexing speed
- Tune for search speed
- Tune approximate kNN search
- Tune for disk usage
- Size your shards
- Use Elasticsearch for time series data
Troubleshooting
- Fix common cluster issues
  - Error: disk usage exceeded flood-stage watermark, index has read-only-allow-delete block
  - Circuit breaker errors
  - High CPU usage
  - High JVM memory pressure
  - Red or yellow cluster status
  - Rejected requests
  - Task queue backlog
- Diagnose unassigned shards
- Add a missing tier to the system
- Allow Elasticsearch to allocate the data in the system
- Allow Elasticsearch to allocate the index
- Indices mix index allocation filters with data tiers node roles to move through data tiers
- Not enough nodes to allocate all shard replicas
- Total number of shards for an index on a single node exceeded
- Total number of shards per node has been reached
- Troubleshooting corruption
- Fix data nodes out of disk
  - Increase the disk capacity of data nodes
  - Decrease the disk usage of data nodes
- Fix master nodes out of disk
- Fix other role nodes out of disk
- Start index lifecycle management
- Start Snapshot Lifecycle Management
- Restore from snapshot
- Multiple deployments writing to the same snapshot repository
- Addressing repeated snapshot policy failures
- Troubleshooting discovery
- Troubleshooting monitoring
- Troubleshooting transforms
- Troubleshooting Watcher
- Troubleshooting searches
REST APIs
- API conventions
- Common options
- REST API compatibility
- Autoscaling APIs
  - Create or update autoscaling policy
  - Get autoscaling capacity
  - Delete autoscaling policy
  - Get autoscaling policy
- Compact and aligned text (CAT) APIs
  - cat aliases
  - cat allocation
  - cat anomaly detectors
  - cat count
  - cat data frame analytics
  - cat datafeeds
  - cat fielddata
  - cat health
  - cat indices
  - cat master
  - cat nodeattrs
  - cat nodes
  - cat pending tasks
  - cat plugins
  - cat recovery
  - cat repositories
  - cat segments
  - cat shards
  - cat snapshots
  - cat task management
  - cat templates
  - cat thread pool
  - cat trained model
  - cat transforms
- Cluster APIs
  - Cluster allocation explain
  - Cluster get settings
  - Cluster health
  - Health
  - Cluster reroute
  - Cluster state
  - Cluster stats
  - Cluster update settings
  - Nodes feature usage
  - Nodes hot threads
  - Nodes info
  - Nodes reload secure settings
  - Nodes stats
  - Pending cluster tasks
  - Remote cluster info
  - Task management
  - Voting configuration exclusions
  - Create or update desired nodes
  - Get desired nodes
  - Delete desired nodes
- Cross-cluster replication APIs
  - Get CCR stats
  - Create follower
  - Pause follower
  - Resume follower
  - Unfollow
  - Forget follower
  - Get follower stats
  - Get follower info
  - Create auto-follow pattern
  - Delete auto-follow pattern
  - Get auto-follow pattern
  - Pause auto-follow pattern
  - Resume auto-follow pattern
- Data stream APIs
  - Create data stream
  - Delete data stream
  - Get data stream
  - Migrate to data stream
  - Data stream stats
  - Promote data stream
  - Modify data streams
  - Downsample
- Document APIs
  - Reading and Writing documents
  - Index
  - Get
  - Delete
  - Delete by query
  - Update
  - Update by query
  - Multi get
  - Bulk
  - Reindex
  - Term vectors
  - Multi term vectors
  - ?refresh
  - Optimistic concurrency control
- Enrich APIs
  - Create enrich policy
  - Delete enrich policy
  - Get enrich policy
  - Execute enrich policy
  - Enrich stats
- EQL APIs
  - Delete async EQL search
  - EQL search
  - Get async EQL search
  - Get async EQL search status
- Features APIs
  - Get features
  - Reset features
- Fleet APIs
  - Get global checkpoints
  - Fleet search
  - Fleet search
- Find structure API
- Graph explore API
- Index APIs
  - Alias exists
  - Aliases
  - Analyze
  - Analyze index disk usage
  - Clear cache
  - Clone index
  - Close index
  - Create index
  - Create or update alias
  - Create or update component template
  - Create or update index template
  - Create or update index template (legacy)
  - Delete component template
  - Delete dangling index
  - Delete alias
  - Delete index
  - Delete index template
  - Delete index template (legacy)
  - Exists
  - Field usage stats
  - Flush
  - Force merge
  - Get alias
  - Get component template
  - Get field mapping
  - Get index
  - Get index settings
  - Get index template
  - Get index template (legacy)
  - Get mapping
  - Import dangling index
  - Index recovery
  - Index segments
  - Index shard stores
  - Index stats
  - Index template exists (legacy)
  - List dangling indices
  - Open index
  - Refresh
  - Resolve index
  - Rollover
  - Shrink index
  - Simulate index
  - Simulate template
  - Split index
  - Unfreeze index
  - Update index settings
  - Update mapping
- Index lifecycle management APIs
  - Create or update lifecycle policy
  - Get policy
  - Delete policy
  - Move to step
  - Remove policy
  - Retry policy
  - Get index lifecycle management status
  - Explain lifecycle
  - Start index lifecycle management
  - Stop index lifecycle management
  - Migrate indices, ILM policies, and legacy, composable and component templates to data tiers routing
- Ingest APIs
  - Create or update pipeline
  - Delete pipeline
  - GeoIP stats
  - Get pipeline
  - Simulate pipeline
- Info API
- Licensing APIs
  - Delete license
  - Get license
  - Get trial status
  - Start trial
  - Get basic status
  - Start basic
  - Update license
- Logstash APIs
  - Create or update Logstash pipeline
  - Delete Logstash pipeline
  - Get Logstash pipeline
- Machine learning APIs
  - Get machine learning info
  - Get machine learning memory stats
  - Set upgrade mode
- Machine learning anomaly detection APIs
  - Add events to calendar
  - Add jobs to calendar
  - Close jobs
  - Create jobs
  - Create calendars
  - Create datafeeds
  - Create filters
  - Delete calendars
  - Delete datafeeds
  - Delete events from calendar
  - Delete filters
  - Delete forecasts
  - Delete jobs
  - Delete jobs from calendar
  - Delete model snapshots
  - Delete expired data
  - Estimate model memory
  - Flush jobs
  - Forecast jobs
  - Get buckets
  - Get calendars
  - Get categories
  - Get datafeeds
  - Get datafeed statistics
  - Get influencers
  - Get jobs
  - Get job statistics
  - Get model snapshots
  - Get model snapshot upgrade statistics
  - Get overall buckets
  - Get scheduled events
  - Get filters
  - Get records
  - Open jobs
  - Post data to jobs
  - Preview datafeeds
  - Reset jobs
  - Revert model snapshots
  - Start datafeeds
  - Stop datafeeds
  - Update datafeeds
  - Update filters
  - Update jobs
  - Update model snapshots
  - Upgrade model snapshots
- Machine learning data frame analytics APIs
  - Create data frame analytics jobs
  - Delete data frame analytics jobs
  - Evaluate data frame analytics
  - Explain data frame analytics
  - Get data frame analytics jobs
  - Get data frame analytics jobs stats
  - Preview data frame analytics
  - Start data frame analytics jobs
  - Stop data frame analytics jobs
  - Update data frame analytics jobs
- Machine learning trained model APIs
  - Clear trained model deployment cache
  - Create or update trained model aliases
  - Create part of a trained model
  - Create trained models
  - Create trained model vocabulary
  - Delete trained model aliases
  - Delete trained models
  - Get trained models
  - Get trained models stats
  - Infer trained model
  - Start trained model deployment
  - Stop trained model deployment
- Migration APIs
  - Deprecation info
  - Feature migration
- Node lifecycle APIs
  - Put shutdown API
  - Get shutdown API
  - Delete shutdown API
- Reload search analyzers API
- Repositories metering APIs
  - Get repositories metering information
  - Clear repositories metering archive
- Rollup APIs
  - Create rollup jobs
  - Delete rollup jobs
  - Get job
  - Get rollup caps
  - Get rollup index caps
  - Rollup search
  - Start rollup jobs
  - Stop rollup jobs
- Script APIs
  - Create or update stored script
  - Delete stored script
  - Get script contexts
  - Get script languages
  - Get stored script
- Search APIs
  - Search
  - Async search
  - Point in time
  - kNN search
  - Scroll
  - Clear scroll
  - Search template
  - Multi search template
  - Render search template
  - Search shards
  - Suggesters
  - Multi search
  - Count
  - Validate
  - Terms enum
  - Explain
  - Profile
  - Field capabilities
  - Ranking evaluation
  - Vector tile search
- Searchable snapshots APIs
  - Mount snapshot
  - Cache stats
  - Searchable snapshot statistics
  - Clear cache
- Security APIs
  - Authenticate
  - Change passwords
  - Clear cache
  - Clear roles cache
  - Clear privileges cache
  - Clear API key cache
  - Clear service account token caches
  - Create API keys
  - Create or update application privileges
  - Create or update role mappings
  - Create or update roles
  - Create or update users
  - Create service account tokens
  - Delegate PKI authentication
  - Delete application privileges
  - Delete role mappings
  - Delete roles
  - Delete service account token
  - Delete users
  - Disable users
  - Enable users
  - Enroll Kibana
  - Enroll node
  - Get API key information
  - Get application privileges
  - Get builtin privileges
  - Get role mappings
  - Get roles
  - Get service accounts
  - Get service account credentials
  - Get token
  - Get user privileges
  - Get users
  - Grant API keys
  - Has privileges
  - Invalidate API key
  - Invalidate token
  - OpenID Connect prepare authentication
  - OpenID Connect authenticate
  - OpenID Connect logout
  - Query API key information
  - Update API key
  - Bulk update API keys
  - SAML prepare authentication
  - SAML authenticate
  - SAML logout
  - SAML invalidate
  - SAML complete logout
  - SAML service provider metadata
  - SSL certificate
  - Activate user profile
  - Disable user profile
  - Enable user profile
  - Get user profiles
  - Suggest user profile
  - Update user profile data
  - Has privileges user profile
- Snapshot and restore APIs
  - Create or update snapshot repository
  - Verify snapshot repository
  - Repository analysis
  - Get snapshot repository
  - Delete snapshot repository
  - Clean up snapshot repository
  - Clone snapshot
  - Create snapshot
  - Get snapshot
  - Get snapshot status
  - Restore snapshot
  - Delete snapshot
- Snapshot lifecycle management APIs
  - Create or update policy
  - Get policy
  - Delete policy
  - Execute snapshot lifecycle policy
  - Execute snapshot retention policy
  - Get snapshot lifecycle management status
  - Get snapshot lifecycle stats
  - Start snapshot lifecycle management
  - Stop snapshot lifecycle management
- SQL APIs
  - Clear SQL cursor
  - Delete async SQL search
  - Get async SQL search
  - Get async SQL search status
  - SQL search
  - SQL translate
- Transform APIs
  - Create transform
  - Delete transform
  - Get transforms
  - Get transform statistics
  - Preview transform
  - Reset transform
  - Start transform
  - Stop transforms
  - Update transform
  - Upgrade transforms
- Usage API
- Watcher APIs
  - Ack watch
  - Activate watch
  - Deactivate watch
  - Delete watch
  - Execute watch
  - Get watch
  - Get Watcher stats
  - Query watches
  - Create or update watch
  - Start watch service
  - Stop watch service
- Definitions
  - Role mapping resources
Migration guide
- 8.5
- 8.4
- 8.3
- 8.2
- 8.1
- 8.0
  - Java time migration guide
  - Transient settings migration guide
Release notes
- Elasticsearch version 8.5.3
- Elasticsearch version 8.5.2
- Elasticsearch version 8.5.1
- Elasticsearch version 8.5.0
- Elasticsearch version 8.4.3
- Elasticsearch version 8.4.2
- Elasticsearch version 8.4.1
- Elasticsearch version 8.4.0
- Elasticsearch version 8.3.3
- Elasticsearch version 8.3.2
- Elasticsearch version 8.3.1
- Elasticsearch version 8.3.0
- Elasticsearch version 8.2.3
- Elasticsearch version 8.2.2
- Elasticsearch version 8.2.1
- Elasticsearch version 8.2.0
- Elasticsearch version 8.1.3
- Elasticsearch version 8.1.2
- Elasticsearch version 8.1.1
- Elasticsearch version 8.1.0
- Elasticsearch version 8.0.1
- Elasticsearch version 8.0.0
- Elasticsearch version 8.0.0-rc2
- Elasticsearch version 8.0.0-rc1
- Elasticsearch version 8.0.0-beta1
- Elasticsearch version 8.0.0-alpha2
- Elasticsearch version 8.0.0-alpha1
Dependencies and versions

IMPORTANT: No additional bug fixes or documentation updates will be released for this version. For the latest information, see the current release documentation.

« Google Cloud Storage repository Shared file system repository »

› › ›

S3 repository

edit

S3 repository

edit

You can use AWS S3 as a repository for Snapshot/Restore.

If you are looking for a hosted solution of Elasticsearch on AWS, please visit https://www.elastic.co/cloud/.

Getting started

edit

To register an S3 repository, specify the type as s3 when creating the repository. The repository defaults to using ECS IAM Role credentials for authentication. You can also use Using IAM roles for Kubernetes service accounts for authentication Kubernetes service accounts.

The only mandatory setting is the bucket name:

PUT _snapshot/my_s3_repository
{
  "type": "s3",
  "settings": {
    "bucket": "my-bucket"
  }
}

Copy as curl Try in Elastic

Client settings

edit

The client that you use to connect to S3 has a number of settings available. The settings have the form s3.client.CLIENT_NAME.SETTING_NAME. By default, s3 repositories use a client named default, but this can be modified using the repository setting client. For example:

PUT _snapshot/my_s3_repository
{
  "type": "s3",
  "settings": {
    "bucket": "my-bucket",
    "client": "my-alternate-client"
  }
}

Copy as curl Try in Elastic

Most client settings can be added to the elasticsearch.yml configuration file with the exception of the secure settings, which you add to the Elasticsearch keystore. For more information about creating and updating the Elasticsearch keystore, see Secure settings.

For example, if you want to use specific credentials to access S3 then run the following commands to add these credentials to the keystore:

bin/elasticsearch-keystore add s3.client.default.access_key
bin/elasticsearch-keystore add s3.client.default.secret_key
# a session token is optional so the following command may not be needed
bin/elasticsearch-keystore add s3.client.default.session_token

If instead you want to use the instance role or container role to access S3 then you should leave these settings unset. You can switch from using specific credentials back to the default of using the instance role or container role by removing these settings from the keystore as follows:

bin/elasticsearch-keystore remove s3.client.default.access_key
bin/elasticsearch-keystore remove s3.client.default.secret_key
# a session token is optional so the following command may not be needed
bin/elasticsearch-keystore remove s3.client.default.session_token

All client secure settings of this repository type are reloadable. After you reload the settings, the internal s3 clients, used to transfer the snapshot contents, will utilize the latest settings from the keystore. Any existing s3 repositories, as well as any newly created ones, will pick up the new values stored in the keystore.

In-progress snapshot/restore tasks will not be preempted by a reload of the client’s secure settings. The task will complete using the client as it was built when the operation started.

The following list contains the available client settings. Those that must be stored in the keystore are marked as "secure" and are reloadable; the other settings belong in the elasticsearch.yml file.

access_key (Secure, reloadable): An S3 access key. If set, the secret_key setting must also be specified. If unset, the client will use the instance or container role instead.
secret_key (Secure, reloadable): An S3 secret key. If set, the access_key setting must also be specified.
session_token (Secure, reloadable): An S3 session token. If set, the access_key and secret_key settings must also be specified.
endpoint: The S3 service endpoint to connect to. This defaults to s3.amazonaws.com but the AWS documentation lists alternative S3 endpoints. If you are using an S3-compatible service then you should set this to the service’s endpoint.
protocol: The protocol to use to connect to S3. Valid values are either http or https. Defaults to https. When using HTTPS, this repository type validates the repository’s certificate chain using the JVM-wide truststore. Ensure that the root certificate authority is in this truststore using the JVM’s keytool tool.
proxy.host: The host name of a proxy to connect to S3 through.
proxy.port: The port of a proxy to connect to S3 through.
proxy.username (Secure, reloadable): The username to connect to the proxy.host with.
proxy.password (Secure, reloadable): The password to connect to the proxy.host with.
read_timeout: The socket timeout for connecting to S3. The value should specify the unit. For example, a value of 5s specifies a 5 second timeout. The default value is 50 seconds.
max_retries: The number of retries to use when an S3 request fails. The default value is 3.
use_throttle_retries: Whether retries should be throttled (i.e. should back off). Must be true or false. Defaults to true.
path_style_access: Whether to force the use of the path style access pattern. If true, the path style access pattern will be used. If false, the access pattern will be automatically determined by the AWS Java SDK (See AWS documentation for details). Defaults to false.

In versions 7.0, 7.1, 7.2 and 7.3 all bucket operations used the now-deprecated path style access pattern. If your deployment requires the path style access pattern then you should set this setting to true when upgrading.

disable_chunked_encoding: Whether chunked encoding should be disabled or not. If false, chunked encoding is enabled and will be used where appropriate. If true, chunked encoding is disabled and will not be used, which may mean that snapshot operations consume more resources and take longer to complete. It should only be set to true if you are using a storage service that does not support chunked encoding. See the AWS Java SDK documentation for details. Defaults to false.
region: Allows specifying the signing region to use. Specificing this setting manually should not be necessary for most use cases. Generally, the SDK will correctly guess the signing region to use. It should be considered an expert level setting to support S3-compatible APIs that require v4 signatures and use a region other than the default us-east-1. Defaults to empty string which means that the SDK will try to automatically determine the correct signing region.
signer_override: Allows specifying the name of the signature algorithm to use for signing requests by the S3 client. Specifying this setting should not be necessary for most use cases. It should be considered an expert level setting to support S3-compatible APIs that do not support the signing algorithm that the SDK automatically determines for them. See the AWS Java SDK documentation for details. Defaults to empty string which means that no signing algorithm override will be used.

S3-compatible services

edit

There are a number of storage systems that provide an S3-compatible API, and the repository-s3 type allows you to use these systems in place of AWS S3. To do so, you should set the s3.client.CLIENT_NAME.endpoint setting to the system’s endpoint. This setting accepts IP addresses and hostnames and may include a port. For example, the endpoint may be 172.17.0.2 or 172.17.0.2:9000.

By default Elasticsearch communicates with your storage system using HTTPS, and validates the repository’s certificate chain using the JVM-wide truststore. Ensure that the JVM-wide truststore includes an entry for your repository. If you wish to use unsecured HTTP communication instead of HTTPS, set s3.client.CLIENT_NAME.protocol to http.

MinIO is an example of a storage system that provides an S3-compatible API. The repository-s3 type allows Elasticsearch to work with MinIO-backed repositories as well as repositories stored on AWS S3. Other S3-compatible storage systems may also work with Elasticsearch, but these are not covered by the Elasticsearch test suite.

Note that some storage systems claim to be S3-compatible but do not faithfully emulate S3’s behaviour in full. The repository-s3 type requires full compatibility with S3. In particular it must support the same set of API endpoints, return the same errors in case of failures, and offer consistency and performance at least as good as S3 even when accessed concurrently by multiple nodes. Incompatible error codes, consistency or performance may be particularly hard to track down since errors, consistency failures, and performance issues are usually rare and hard to reproduce.

You can perform some basic checks of the suitability of your storage system using the repository analysis API. If this API does not complete successfully, or indicates poor performance, then your storage system is not fully compatible with AWS S3 and therefore unsuitable for use as a snapshot repository. You will need to work with the supplier of your storage system to address any incompatibilities you encounter.

Repository settings

edit

The s3 repository type supports a number of settings to customize how data is stored in S3. These can be specified when creating the repository. For example:

PUT _snapshot/my_s3_repository
{
  "type": "s3",
  "settings": {
    "bucket": "my-bucket",
    "another_setting": "setting-value"
  }
}

Copy as curl Try in Elastic

The following settings are supported:

bucket

(Required) Name of the S3 bucket to use for snapshots.

The bucket name must adhere to Amazon’s S3 bucket naming rules.

client

The name of the S3 client to use to connect to S3. Defaults to default.

base_path

Specifies the path to the repository data within its bucket. Defaults to an empty string, meaning that the repository is at the root of the bucket. The value of this setting should not start or end with a /.

Don’t set base_path when configuring a snapshot repository for Elastic Cloud Enterprise. Elastic Cloud Enterprise automatically generates the base_path for each deployment so that multiple deployments may share the same bucket.

chunk_size

Big files can be broken down into chunks during snapshotting if needed. Specify the chunk size as a value and unit, for example: 1TB, 1GB, 10MB. Defaults to the maximum size of a blob in the S3 which is 5TB.

compress

When set to true metadata files are stored in compressed format. This setting doesn’t affect index files that are already compressed by default. Defaults to true.

max_restore_bytes_per_sec

(Optional, byte value) Maximum snapshot restore rate per node. Defaults to unlimited. Note that restores are also throttled through recovery settings.

max_snapshot_bytes_per_sec

(Optional, byte value) Maximum snapshot creation rate per node. Defaults to 40mb per second.

readonly

(Optional, Boolean) If true, the repository is read-only. The cluster can retrieve and restore snapshots from the repository but not write to the repository or create snapshots in it.

Only a cluster with write access can create snapshots in the repository. All other clusters connected to the repository should have the readonly parameter set to true.

If false, the cluster can write to the repository and create snapshots in it. Defaults to false.

If you register the same snapshot repository with multiple clusters, only one cluster should have write access to the repository. Having multiple clusters write to the repository at the same time risks corrupting the contents of the repository.

server_side_encryption

When set to true files are encrypted on server side using AES256 algorithm. Defaults to false.

buffer_size

Minimum threshold below which the chunk is uploaded using a single request. Beyond this threshold, the S3 repository will use the AWS Multipart Upload API to split the chunk into several parts, each of buffer_size length, and to upload each part in its own request. Note that setting a buffer size lower than 5mb is not allowed since it will prevent the use of the Multipart API and may result in upload errors. It is also not possible to set a buffer size greater than 5gb as it is the maximum upload size allowed by S3. Defaults to 100mb or 5% of JVM heap, whichever is smaller.

canned_acl

The S3 repository supports all S3 canned ACLs : private, public-read, public-read-write, authenticated-read, log-delivery-write, bucket-owner-read, bucket-owner-full-control. Defaults to private. You could specify a canned ACL using the canned_acl setting. When the S3 repository creates buckets and objects, it adds the canned ACL into the buckets and objects.

storage_class

Sets the S3 storage class for objects stored in the snapshot repository. Values may be standard, reduced_redundancy, standard_ia, onezone_ia and intelligent_tiering. Defaults to standard. Changing this setting on an existing repository only affects the storage class for newly created objects, resulting in a mixed usage of storage classes. You may use an S3 Lifecycle Policy to adjust the storage class of existing objects in your repository, but you must not transition objects to Glacier classes and you must not expire objects. If you use Glacier storage classes or object expiry then you may permanently lose access to your repository contents. For more information about S3 storage classes, see AWS Storage Classes Guide

The option of defining client settings in the repository settings as documented below is considered deprecated, and will be removed in a future version.

In addition to the above settings, you may also specify all non-secure client settings in the repository settings. In this case, the client settings found in the repository settings will be merged with those of the named client used by the repository. Conflicts between client and repository settings are resolved by the repository settings taking precedence over client settings.

For example:

PUT _snapshot/my_s3_repository
{
  "type": "s3",
  "settings": {
    "client": "my-client",
    "bucket": "my-bucket",
    "endpoint": "my.s3.endpoint"
  }
}

Copy as curl Try in Elastic

This sets up a repository that uses all client settings from the client my_client_name except for the endpoint that is overridden to my.s3.endpoint by the repository settings.

Recommended S3 permissions

edit

In order to restrict the Elasticsearch snapshot process to the minimum required resources, we recommend using Amazon IAM in conjunction with pre-existing S3 buckets. Here is an example policy which will allow the snapshot access to an S3 bucket named "snaps.example.com". This may be configured through the AWS IAM console, by creating a Custom Policy, and using a Policy Document similar to this (changing snaps.example.com to your bucket name).

{
  "Statement": [
    {
      "Action": [
        "s3:ListBucket",
        "s3:GetBucketLocation",
        "s3:ListBucketMultipartUploads",
        "s3:ListBucketVersions"
      ],
      "Effect": "Allow",
      "Resource": [
        "arn:aws:s3:::snaps.example.com"
      ]
    },
    {
      "Action": [
        "s3:GetObject",
        "s3:PutObject",
        "s3:DeleteObject",
        "s3:AbortMultipartUpload",
        "s3:ListMultipartUploadParts"
      ],
      "Effect": "Allow",
      "Resource": [
        "arn:aws:s3:::snaps.example.com/*"
      ]
    }
  ],
  "Version": "2012-10-17"
}

You may further restrict the permissions by specifying a prefix within the bucket, in this example, named "foo".

{
  "Statement": [
    {
      "Action": [
        "s3:ListBucket",
        "s3:GetBucketLocation",
        "s3:ListBucketMultipartUploads",
        "s3:ListBucketVersions"
      ],
      "Condition": {
        "StringLike": {
          "s3:prefix": [
            "foo/*"
          ]
        }
      },
      "Effect": "Allow",
      "Resource": [
        "arn:aws:s3:::snaps.example.com"
      ]
    },
    {
      "Action": [
        "s3:GetObject",
        "s3:PutObject",
        "s3:DeleteObject",
        "s3:AbortMultipartUpload",
        "s3:ListMultipartUploadParts"
      ],
      "Effect": "Allow",
      "Resource": [
        "arn:aws:s3:::snaps.example.com/foo/*"
      ]
    }
  ],
  "Version": "2012-10-17"
}

The bucket needs to exist to register a repository for snapshots. If you did not create the bucket then the repository registration will fail.

Cleaning up multi-part uploads

edit

Elasticsearch uses S3’s multi-part upload process to upload larger blobs to the repository. The multi-part upload process works by dividing each blob into smaller parts, uploading each part independently, and then completing the upload in a separate step. This reduces the amount of data that Elasticsearch must re-send if an upload fails: Elasticsearch only needs to re-send the part that failed rather than starting from the beginning of the whole blob. The storage for each part is charged independently starting from the time at which the part was uploaded.

If a multi-part upload cannot be completed then it must be aborted in order to delete any parts that were successfully uploaded, preventing further storage charges from accumulating. Elasticsearch will automatically abort a multi-part upload on failure, but sometimes the abort request itself fails. For example, if the repository becomes inaccessible or the instance on which Elasticsearch is running is terminated abruptly then Elasticsearch cannot complete or abort any ongoing uploads.

You must make sure that failed uploads are eventually aborted to avoid unnecessary storage costs. You can use the List multipart uploads API to list the ongoing uploads and look for any which are unusually long-running, or you can configure a bucket lifecycle policy to automatically abort incomplete uploads once they reach a certain age.

AWS VPC bandwidth settings

edit

AWS instances resolve S3 endpoints to a public IP. If the Elasticsearch instances reside in a private subnet in an AWS VPC then all traffic to S3 will go through the VPC’s NAT instance. If your VPC’s NAT instance is a smaller instance size (e.g. a t2.micro) or is handling a high volume of network traffic your bandwidth to S3 may be limited by that NAT instance’s networking bandwidth limitations. Instead we recommend creating a VPC endpoint that enables connecting to S3 in instances that reside in a private subnet in an AWS VPC. This will eliminate any limitations imposed by the network bandwidth of your VPC’s NAT instance.

Instances residing in a public subnet in an AWS VPC will connect to S3 via the VPC’s internet gateway and not be bandwidth limited by the VPC’s NAT instance.

Using IAM roles for Kubernetes service accounts for authentication

edit

If you want to use Kubernetes service accounts for authentication, you need to add a symlink to the $AWS_WEB_IDENTITY_TOKEN_FILE environment variable (which should be automatically set by a Kubernetes pod) in the S3 repository config directory, so the repository can have the read access for the service account (a repository can’t read any files outside its config directory). For example:

mkdir -p "${ES_PATH_CONF}/repository-s3"
ln -s $AWS_WEB_IDENTITY_TOKEN_FILE "${ES_PATH_CONF}/repository-s3/aws-web-identity-token-file"

The symlink must be created on all data and master eligible nodes and be readable by the elasticsearch user. By default, Elasticsearch runs as user elasticsearch using uid:gid 1000:0.

If the symlink exists, it will be used by default by all S3 repositories that don’t have explicit client credentials.

« Google Cloud Storage repository Shared file system repository »

On this page

Getting started
Client settings
S3-compatible services
Repository settings
Recommended S3 permissions
Cleaning up multi-part uploads
AWS VPC bandwidth settings
Using IAM roles for Kubernetes service accounts for authentication

Was this helpful?

Feedback

The Search AI Company

ELK Stack

Elastic Cloud

Generative AI

Search

Security

Observability

By solution

Industries

Customer spotlight

Research

Build

Learn

Connect

S3 repository

S3 repository

Getting started

Client settings

S3-compatible services

Repository settings

Recommended S3 permissions

Cleaning up multi-part uploads

AWS VPC bandwidth settings

Using IAM roles for Kubernetes service accounts for authentication

Follow us

About us

Join us

Partners

Trust & Security

Investor relations

Excellence Awards

About us

Join us

Partners

Trust & Security

Investor relations

Excellence Awards