- Elasticsearch Guide: other versions:
- What is Elasticsearch?
- What’s new in 7.17
- Quick start
- Set up Elasticsearch
- Installing Elasticsearch
- Configuring Elasticsearch
- Important Elasticsearch configuration
- Secure settings
- Auditing settings
- Circuit breaker settings
- Cluster-level shard allocation and routing settings
- Cross-cluster replication settings
- Discovery and cluster formation settings
- Field data cache settings
- Index lifecycle management settings
- Index management settings
- Index recovery settings
- Indexing buffer settings
- License settings
- Local gateway settings
- Logging
- Machine learning settings
- Monitoring settings
- Node
- Networking
- Node query cache settings
- Search settings
- Security settings
- Shard request cache settings
- Snapshot and restore settings
- Transforms settings
- Thread pools
- Watcher settings
- Advanced configuration
- Important system configuration
- Bootstrap Checks
- Heap size check
- File descriptor check
- Memory lock check
- Maximum number of threads check
- Max file size check
- Maximum size virtual memory check
- Maximum map count check
- Client JVM check
- Use serial collector check
- System call filter check
- OnError and OnOutOfMemoryError checks
- Early-access check
- G1GC check
- All permission check
- Discovery configuration check
- Bootstrap Checks for X-Pack
- Starting Elasticsearch
- Stopping Elasticsearch
- Discovery and cluster formation
- Add and remove nodes in your cluster
- Full-cluster restart and rolling restart
- Remote clusters
- Set up X-Pack
- Configuring X-Pack Java Clients
- Plugins
- Upgrade Elasticsearch
- Index modules
- Mapping
- Text analysis
- Overview
- Concepts
- Configure text analysis
- Built-in analyzer reference
- Tokenizer reference
- Token filter reference
- Apostrophe
- ASCII folding
- CJK bigram
- CJK width
- Classic
- Common grams
- Conditional
- Decimal digit
- Delimited payload
- Dictionary decompounder
- Edge n-gram
- Elision
- Fingerprint
- Flatten graph
- Hunspell
- Hyphenation decompounder
- Keep types
- Keep words
- Keyword marker
- Keyword repeat
- KStem
- Length
- Limit token count
- Lowercase
- MinHash
- Multiplexer
- N-gram
- Normalization
- Pattern capture
- Pattern replace
- Phonetic
- Porter stem
- Predicate script
- Remove duplicates
- Reverse
- Shingle
- Snowball
- Stemmer
- Stemmer override
- Stop
- Synonym
- Synonym graph
- Trim
- Truncate
- Unique
- Uppercase
- Word delimiter
- Word delimiter graph
- Character filters reference
- Normalizers
- Index templates
- Data streams
- Ingest pipelines
- Example: Parse logs
- Enrich your data
- Processor reference
- Append
- Bytes
- Circle
- Community ID
- Convert
- CSV
- Date
- Date index name
- Dissect
- Dot expander
- Drop
- Enrich
- Fail
- Fingerprint
- Foreach
- GeoIP
- Grok
- Gsub
- HTML strip
- Inference
- Join
- JSON
- KV
- Lowercase
- Network direction
- Pipeline
- Registered domain
- Remove
- Rename
- Script
- Set
- Set security user
- Sort
- Split
- Trim
- Uppercase
- URL decode
- URI parts
- User agent
- Aliases
- Search your data
- Query DSL
- Aggregations
- Bucket aggregations
- Adjacency matrix
- Auto-interval date histogram
- Categorize text
- Children
- Composite
- Date histogram
- Date range
- Diversified sampler
- Filter
- Filters
- Geo-distance
- Geohash grid
- Geotile grid
- Global
- Histogram
- IP range
- Missing
- Multi Terms
- Nested
- Parent
- Range
- Rare terms
- Reverse nested
- Sampler
- Significant terms
- Significant text
- Terms
- Variable width histogram
- Subtleties of bucketing range fields
- Metrics aggregations
- Pipeline aggregations
- Average bucket
- Bucket script
- Bucket count K-S test
- Bucket correlation
- Bucket selector
- Bucket sort
- Cumulative cardinality
- Cumulative sum
- Derivative
- Extended stats bucket
- Inference bucket
- Max bucket
- Min bucket
- Moving average
- Moving function
- Moving percentiles
- Normalize
- Percentiles bucket
- Serial differencing
- Stats bucket
- Sum bucket
- Bucket aggregations
- EQL
- SQL
- Overview
- Getting Started with SQL
- Conventions and Terminology
- Security
- SQL REST API
- SQL Translate API
- SQL CLI
- SQL JDBC
- SQL ODBC
- SQL Client Applications
- SQL Language
- Functions and Operators
- Comparison Operators
- Logical Operators
- Math Operators
- Cast Operators
- LIKE and RLIKE Operators
- Aggregate Functions
- Grouping Functions
- Date/Time and Interval Functions and Operators
- Full-Text Search Functions
- Mathematical Functions
- String Functions
- Type Conversion Functions
- Geo Functions
- Conditional Functions And Expressions
- System Functions
- Reserved keywords
- SQL Limitations
- Scripting
- Data management
- ILM: Manage the index lifecycle
- Overview
- Concepts
- Automate rollover
- Customize built-in ILM policies
- Index lifecycle actions
- Configure a lifecycle policy
- Migrate index allocation filters to node roles
- Troubleshooting index lifecycle management errors
- Start and stop index lifecycle management
- Manage existing indices
- Skip rollover
- Restore a managed data stream or index
- Autoscaling
- Monitor a cluster
- Roll up or transform your data
- Set up a cluster for high availability
- Snapshot and restore
- Secure the Elastic Stack
- Elasticsearch security principles
- Configuring security
- Updating node security certificates
- User authentication
- Built-in users
- Service accounts
- Internal users
- Token-based authentication services
- Realms
- Realm chains
- Active Directory user authentication
- File-based user authentication
- LDAP user authentication
- Native user authentication
- OpenID Connect authentication
- PKI user authentication
- SAML authentication
- Kerberos authentication
- Integrating with other authentication systems
- Enabling anonymous access
- Controlling the user cache
- Configuring SAML single-sign-on on the Elastic Stack
- Configuring single sign-on to the Elastic Stack using OpenID Connect
- User authorization
- Built-in roles
- Defining roles
- Security privileges
- Document level security
- Field level security
- Granting privileges for data streams and aliases
- Mapping users and groups to roles
- Setting up field and document level security
- Submitting requests on behalf of other users
- Configuring authorization delegation
- Customizing roles and authorization
- Enable audit logging
- Restricting connections with IP filtering
- Securing clients and integrations
- Operator privileges
- Troubleshooting
- Some settings are not returned via the nodes settings API
- Authorization exceptions
- Users command fails due to extra arguments
- Users are frequently locked out of Active Directory
- Certificate verification fails for curl on Mac
- SSLHandshakeException causes connections to fail
- Common SSL/TLS exceptions
- Common Kerberos exceptions
- Common SAML issues
- Internal Server Error in Kibana
- Setup-passwords command fails due to connection failure
- Failures due to relocation of the configuration files
- Limitations
- Watcher
- Command line tools
- How to
- REST APIs
- API conventions
- Autoscaling APIs
- Compact and aligned text (CAT) APIs
- cat aliases
- cat allocation
- cat anomaly detectors
- cat count
- cat data frame analytics
- cat datafeeds
- cat fielddata
- cat health
- cat indices
- cat master
- cat nodeattrs
- cat nodes
- cat pending tasks
- cat plugins
- cat recovery
- cat repositories
- cat segments
- cat shards
- cat snapshots
- cat task management
- cat templates
- cat thread pool
- cat trained model
- cat transforms
- Cluster APIs
- Cluster allocation explain
- Cluster get settings
- Cluster health
- Cluster reroute
- Cluster state
- Cluster stats
- Cluster update settings
- Nodes feature usage
- Nodes hot threads
- Nodes info
- Nodes reload secure settings
- Nodes stats
- Pending cluster tasks
- Remote cluster info
- Task management
- Voting configuration exclusions
- Cross-cluster replication APIs
- Data stream APIs
- Document APIs
- Enrich APIs
- EQL APIs
- Features APIs
- Fleet APIs
- Find structure API
- Graph explore API
- Index APIs
- Alias exists
- Aliases
- Analyze
- Analyze index disk usage
- Clear cache
- Clone index
- Close index
- Create index
- Create or update alias
- Create or update component template
- Create or update index template
- Create or update index template (legacy)
- Delete component template
- Delete dangling index
- Delete alias
- Delete index
- Delete index template
- Delete index template (legacy)
- Exists
- Field usage stats
- Flush
- Force merge
- Freeze index
- Get alias
- Get component template
- Get field mapping
- Get index
- Get index settings
- Get index template
- Get index template (legacy)
- Get mapping
- Import dangling index
- Index recovery
- Index segments
- Index shard stores
- Index stats
- Index template exists (legacy)
- List dangling indices
- Open index
- Refresh
- Resolve index
- Rollover
- Shrink index
- Simulate index
- Simulate template
- Split index
- Synced flush
- Type exists
- Unfreeze index
- Update index settings
- Update mapping
- Index lifecycle management APIs
- Create or update lifecycle policy
- Get policy
- Delete policy
- Move to step
- Remove policy
- Retry policy
- Get index lifecycle management status
- Explain lifecycle
- Start index lifecycle management
- Stop index lifecycle management
- Migrate indices, ILM policies, and legacy, composable and component templates to data tiers routing
- Ingest APIs
- Info API
- Licensing APIs
- Logstash APIs
- Machine learning anomaly detection APIs
- Add events to calendar
- Add jobs to calendar
- Close jobs
- Create jobs
- Create calendars
- Create datafeeds
- Create filters
- Delete calendars
- Delete datafeeds
- Delete events from calendar
- Delete filters
- Delete forecasts
- Delete jobs
- Delete jobs from calendar
- Delete model snapshots
- Delete expired data
- Estimate model memory
- Find file structure
- Flush jobs
- Forecast jobs
- Get buckets
- Get calendars
- Get categories
- Get datafeeds
- Get datafeed statistics
- Get influencers
- Get jobs
- Get job statistics
- Get machine learning info
- Get model snapshots
- Get model snapshot upgrade statistics
- Get overall buckets
- Get scheduled events
- Get filters
- Get records
- Open jobs
- Post data to jobs
- Preview datafeeds
- Reset jobs
- Revert model snapshots
- Set upgrade mode
- Start datafeeds
- Stop datafeeds
- Update datafeeds
- Update filters
- Update jobs
- Update model snapshots
- Upgrade model snapshots
- Machine learning data frame analytics APIs
- Create data frame analytics jobs
- Delete data frame analytics jobs
- Evaluate data frame analytics
- Explain data frame analytics
- Get data frame analytics jobs
- Get data frame analytics jobs stats
- Preview data frame analytics
- Start data frame analytics jobs
- Stop data frame analytics jobs
- Update data frame analytics jobs
- Machine learning trained model APIs
- Migration APIs
- Node lifecycle APIs
- Reload search analyzers API
- Repositories metering APIs
- Rollup APIs
- Script APIs
- Search APIs
- Searchable snapshots APIs
- Security APIs
- Authenticate
- Change passwords
- Clear cache
- Clear roles cache
- Clear privileges cache
- Clear API key cache
- Clear service account token caches
- Create API keys
- Create or update application privileges
- Create or update role mappings
- Create or update roles
- Create or update users
- Create service account tokens
- Delegate PKI authentication
- Delete application privileges
- Delete role mappings
- Delete roles
- Delete service account token
- Delete users
- Disable users
- Enable users
- Get API key information
- Get application privileges
- Get builtin privileges
- Get role mappings
- Get roles
- Get service accounts
- Get service account credentials
- Get token
- Get user privileges
- Get users
- Grant API keys
- Has privileges
- Invalidate API key
- Invalidate token
- OpenID Connect prepare authentication
- OpenID Connect authenticate
- OpenID Connect logout
- Query API key information
- SAML prepare authentication
- SAML authenticate
- SAML logout
- SAML invalidate
- SAML complete logout
- SAML service provider metadata
- SSL certificate
- Snapshot and restore APIs
- Snapshot lifecycle management APIs
- SQL APIs
- Transform APIs
- Usage API
- Watcher APIs
- Definitions
- Migration guide
- Release notes
- Elasticsearch version 7.17.27
- Elasticsearch version 7.17.26
- Elasticsearch version 7.17.25
- Elasticsearch version 7.17.24
- Elasticsearch version 7.17.23
- Elasticsearch version 7.17.22
- Elasticsearch version 7.17.21
- Elasticsearch version 7.17.20
- Elasticsearch version 7.17.19
- Elasticsearch version 7.17.18
- Elasticsearch version 7.17.17
- Elasticsearch version 7.17.16
- Elasticsearch version 7.17.15
- Elasticsearch version 7.17.14
- Elasticsearch version 7.17.13
- Elasticsearch version 7.17.12
- Elasticsearch version 7.17.11
- Elasticsearch version 7.17.10
- Elasticsearch version 7.17.9
- Elasticsearch version 7.17.8
- Elasticsearch version 7.17.7
- Elasticsearch version 7.17.6
- Elasticsearch version 7.17.5
- Elasticsearch version 7.17.4
- Elasticsearch version 7.17.3
- Elasticsearch version 7.17.2
- Elasticsearch version 7.17.1
- Elasticsearch version 7.17.0
- Elasticsearch version 7.16.3
- Elasticsearch version 7.16.2
- Elasticsearch version 7.16.1
- Elasticsearch version 7.16.0
- Elasticsearch version 7.15.2
- Elasticsearch version 7.15.1
- Elasticsearch version 7.15.0
- Elasticsearch version 7.14.2
- Elasticsearch version 7.14.1
- Elasticsearch version 7.14.0
- Elasticsearch version 7.13.4
- Elasticsearch version 7.13.3
- Elasticsearch version 7.13.2
- Elasticsearch version 7.13.1
- Elasticsearch version 7.13.0
- Elasticsearch version 7.12.1
- Elasticsearch version 7.12.0
- Elasticsearch version 7.11.2
- Elasticsearch version 7.11.1
- Elasticsearch version 7.11.0
- Elasticsearch version 7.10.2
- Elasticsearch version 7.10.1
- Elasticsearch version 7.10.0
- Elasticsearch version 7.9.3
- Elasticsearch version 7.9.2
- Elasticsearch version 7.9.1
- Elasticsearch version 7.9.0
- Elasticsearch version 7.8.1
- Elasticsearch version 7.8.0
- Elasticsearch version 7.7.1
- Elasticsearch version 7.7.0
- Elasticsearch version 7.6.2
- Elasticsearch version 7.6.1
- Elasticsearch version 7.6.0
- Elasticsearch version 7.5.2
- Elasticsearch version 7.5.1
- Elasticsearch version 7.5.0
- Elasticsearch version 7.4.2
- Elasticsearch version 7.4.1
- Elasticsearch version 7.4.0
- Elasticsearch version 7.3.2
- Elasticsearch version 7.3.1
- Elasticsearch version 7.3.0
- Elasticsearch version 7.2.1
- Elasticsearch version 7.2.0
- Elasticsearch version 7.1.1
- Elasticsearch version 7.1.0
- Elasticsearch version 7.0.0
- Elasticsearch version 7.0.0-rc2
- Elasticsearch version 7.0.0-rc1
- Elasticsearch version 7.0.0-beta1
- Elasticsearch version 7.0.0-alpha2
- Elasticsearch version 7.0.0-alpha1
- Dependencies and versions
Searchable snapshots
editSearchable snapshots
editSearchable snapshots let you use snapshots to search infrequently accessed and read-only data in a very cost-effective fashion. The cold and frozen data tiers use searchable snapshots to reduce your storage and operating costs.
Searchable snapshots eliminate the need for replica shards, potentially halving the local storage needed to search your data. Searchable snapshots rely on the same snapshot mechanism you already use for backups and have minimal impact on your snapshot repository storage costs.
Using searchable snapshots
editSearching a searchable snapshot index is the same as searching any other index.
By default, searchable snapshot indices have no replicas. The underlying snapshot
provides resilience and the query volume is expected to be low enough that a
single shard copy will be sufficient. However, if you need to support a higher
query volume, you can add replicas by adjusting the index.number_of_replicas
index setting.
If a node fails and searchable snapshot shards need to be recovered elsewhere, there
is a brief window of time while Elasticsearch allocates the shards to other nodes where
the cluster health will not be green
. Searches that hit these shards may fail
or return partial results until the shards are reallocated to healthy nodes.
You typically manage searchable snapshots through ILM. The
searchable snapshots action automatically converts
a regular index into a searchable snapshot index when it reaches the cold
or
frozen
phase. You can also make indices in existing snapshots searchable by
manually mounting them using the mount snapshot API.
To mount an index from a snapshot that contains multiple indices, we recommend creating a clone of the snapshot that contains only the index you want to search, and mounting the clone. You should not delete a snapshot if it has any mounted indices, so creating a clone enables you to manage the lifecycle of the backup snapshot independently of any searchable snapshots. If you use ILM to manage your searchable snapshots then it will automatically look after cloning the snapshot as needed.
You can control the allocation of the shards of searchable snapshot indices using the same mechanisms as for regular indices. For example, you could use Index-level shard allocation filtering to restrict searchable snapshot shards to a subset of your nodes.
The speed of recovery of a searchable snapshot index is limited by the repository
setting max_restore_bytes_per_sec
and the node setting
indices.recovery.max_bytes_per_sec
just like a normal restore operation. By
default max_restore_bytes_per_sec
is unlimited, but the default for
indices.recovery.max_bytes_per_sec
depends on the configuration of the node.
See Recovery settings.
We recommend that you force-merge indices to a single segment per shard before taking a snapshot that will be mounted as a searchable snapshot index. Each read from a snapshot repository takes time and costs money, and the fewer segments there are the fewer reads are needed to restore the snapshot or to respond to a search.
Searchable snapshots are ideal for managing a large archive of historical data. Historical information is typically searched less frequently than recent data and therefore may not need replicas for their performance benefits.
For more complex or time-consuming searches, you can use Async search with searchable snapshots.
Use any of the following repository types with searchable snapshots:
You can also use alternative implementations of these repository types, for instance MinIO, as long as they are fully compatible. Use the Repository analysis API to analyze your repository’s suitability for use with searchable snapshots.
How searchable snapshots work
editWhen an index is mounted from a snapshot, Elasticsearch allocates its shards to data nodes within the cluster. The data nodes then automatically retrieve the relevant shard data from the repository onto local storage, based on the mount options specified. If possible, searches use data from local storage. If the data is not available locally, Elasticsearch downloads the data that it needs from the snapshot repository.
If a node holding one of these shards fails, Elasticsearch automatically allocates the
affected shards on another node, and that node restores the relevant shard data
from the repository. No replicas are needed, and no complicated monitoring or
orchestration is necessary to restore lost shards. Although searchable snapshot
indices have no replicas by default, you may add replicas to these indices by
adjusting index.number_of_replicas
. Replicas of searchable snapshot shards are
recovered by copying data from the snapshot repository, just like primaries of
searchable snapshot shards. In contrast, replicas of regular indices are restored by
copying data from the primary.
Mount options
editTo search a snapshot, you must first mount it locally as an index. Usually ILM will do this automatically, but you can also call the mount snapshot API yourself. There are two options for mounting an index from a snapshot, each with different performance characteristics and local storage footprints:
- Fully mounted index
-
Fully caches the snapshotted index’s shards in the Elasticsearch cluster. ILM uses this option in the
hot
andcold
phases.Search performance for a fully mounted index is normally comparable to a regular index, since there is minimal need to access the snapshot repository. While recovery is ongoing, search performance may be slower than with a regular index because a search may need some data that has not yet been retrieved into the local cache. If that happens, Elasticsearch will eagerly retrieve the data needed to complete the search in parallel with the ongoing recovery. On-disk data is preserved across restarts, such that the node does not need to re-download data that is already stored on the node after a restart.
- Partially mounted index
-
Uses a local cache containing only recently searched parts of the snapshotted index’s data. This cache has a fixed size and is shared across shards of partially mounted indices allocated on the same data node. ILM uses this option in the
frozen
phase.If a search requires data that is not in the cache, Elasticsearch fetches the missing data from the snapshot repository. Searches that require these fetches are slower, but the fetched data is stored in the cache so that similar searches can be served more quickly in future. Elasticsearch will evict infrequently used data from the cache to free up space. The cache is cleared when a node is restarted.
Although slower than a fully mounted index or a regular index, a partially mounted index still returns search results quickly, even for large data sets, because the layout of data in the repository is heavily optimized for search. Many searches will need to retrieve only a small subset of the total shard data before returning results.
To partially mount an index, you must have one or more nodes with a shared cache
available. By default, dedicated frozen data tier nodes (nodes with the
data_frozen
role and no other data roles) have a shared cache configured using
the greater of 90% of total disk space and total disk space subtracted a
headroom of 100GB.
Using a dedicated frozen tier is highly recommended for production use. If you
do not have a dedicated frozen tier, you must configure the
xpack.searchable.snapshot.shared_cache.size
setting to reserve space for the
cache on one or more nodes. Partially mounted indices are only allocated to
nodes that have a shared cache.
-
xpack.searchable.snapshot.shared_cache.size
-
(Static)
Disk space reserved for the shared cache of partially mounted indices. Accepts a
percentage of total disk space or an absolute byte value.
Defaults to
90%
of total disk space for dedicated frozen data tier nodes. Otherwise defaults to0b
. -
xpack.searchable.snapshot.shared_cache.size.max_headroom
-
(Static, byte value)
For dedicated frozen tier nodes, the max headroom to maintain. If
xpack.searchable.snapshot.shared_cache.size
is not explicitly set, this setting defaults to100GB
. Otherwise it defaults to-1
(not set). You can only configure this setting ifxpack.searchable.snapshot.shared_cache.size
is set as a percentage.
To illustrate how these settings work in concert let us look at two examples when using the default values of the settings on a dedicated frozen node:
-
A 4000 GB disk will result in a shared cache sized at 3900 GB. 90% of 4000 GB
is 3600 GB, leaving 400 GB headroom. The default
max_headroom
of 100 GB takes effect, and the result is therefore 3900 GB. - A 400 GB disk will result in a shared cache sized at 360 GB.
You can configure the settings in elasticsearch.yml
:
xpack.searchable.snapshot.shared_cache.size: 4TB
Currently, you can configure
xpack.searchable.snapshot.shared_cache.size
on any node. However, if the cache size is set on any
node that does not have the data_frozen
role, it will be treated as though it
is set to 0b
. Additionally, nodes with a shared cache can only have a single
data path.
Elasticsearch also uses a dedicated system index named .snapshot-blob-cache
to speed up
the recoveries of searchable snapshot shards. This index is used as an additional
caching layer on top of the partially or fully mounted data and contains the
minimal required data to start the searchable snapshot shards. Elasticsearch automatically
deletes the documents that are no longer used in this index. This periodic clean
up can be tuned using the following settings:
-
searchable_snapshots.blob_cache.periodic_cleanup.interval
-
(Dynamic)
The interval at which the periodic cleanup of the
.snapshot-blob-cache
index is scheduled. Defaults to every hour (1h
). -
searchable_snapshots.blob_cache.periodic_cleanup.retention_period
-
(Dynamic)
The retention period to keep obsolete documents in the
.snapshot-blob-cache
index. Defaults to every hour (1h
). -
searchable_snapshots.blob_cache.periodic_cleanup.batch_size
-
(Dynamic)
The number of documents that are searched for and bulk-deleted at once during
the periodic cleanup of the
.snapshot-blob-cache
index. Defaults to100
. -
searchable_snapshots.blob_cache.periodic_cleanup.pit_keep_alive
-
(Dynamic)
The value used for the <point-in-time-keep-alive,point-in-time keep alive>>
requests executed during the periodic cleanup of the
.snapshot-blob-cache
index. Defaults to10m
.
Reduce costs with searchable snapshots
editIn most cases, searchable snapshots reduce the costs of running a cluster by removing the need for replica shards and for shard data to be copied between nodes. However, if it’s particularly expensive to retrieve data from a snapshot repository in your environment, searchable snapshots may be more costly than regular indices. Ensure that the cost structure of your operating environment is compatible with searchable snapshots before using them.
Replica costs
editFor resiliency, a regular index requires multiple redundant copies of each shard across multiple nodes. If a node fails, Elasticsearch uses the redundancy to rebuild any lost shard copies. A searchable snapshot index doesn’t require replicas. If a node containing a searchable snapshot index fails, Elasticsearch can rebuild the lost shard cache from the snapshot repository.
Without replicas, rarely-accessed searchable snapshot indices require far fewer resources. A cold data tier that contains replica-free fully-mounted searchable snapshot indices requires half the nodes and disk space of a tier containing the same data in regular indices. The frozen tier, which contains only partially-mounted searchable snapshot indices, requires even fewer resources.
Data transfer costs
editWhen a shard of a regular index is moved between nodes, its contents are copied from another node in your cluster. In many environments, the costs of moving data between nodes are significant, especially if running in a Cloud environment with nodes in different zones. In contrast, when mounting a searchable snapshot index or moving one of its shards, the data is always copied from the snapshot repository. This is typically much cheaper.
Most cloud providers charge significant fees for data transferred between regions and for data transferred out of their platforms. You should only mount snapshots into a cluster that is in the same region as the snapshot repository. If you wish to search data across multiple regions, configure multiple clusters and use cross-cluster search or cross-cluster replication instead of searchable snapshots.
Back up and restore searchable snapshots
editYou can use regular snapshots to back up a cluster containing searchable snapshot indices. When you restore a snapshot containing searchable snapshot indices, these indices are restored as searchable snapshot indices again.
Before you restore a snapshot containing a searchable snapshot index, you must first register the repository containing the original index snapshot. When restored, the searchable snapshot index mounts the original index snapshot from its original repository. If wanted, you can use separate repositories for regular snapshots and searchable snapshots.
A snapshot of a searchable snapshot index contains only a small amount of metadata which identifies its original index snapshot. It does not contain any data from the original index. The restore of a backup will fail to restore any searchable snapshot indices whose original index snapshot is unavailable.
Because searchable snapshot indices are not regular indices, it is not possible to use a source-only repository to take snapshots of searchable snapshot indices.
Reliability of searchable snapshots
editThe sole copy of the data in a searchable snapshot index is the underlying snapshot, stored in the repository. For example:
- You cannot unregister a repository while any of the searchable snapshots it contains are mounted in Elasticsearch. You also cannot delete a snapshot if any of its indices are mounted as a searchable snapshot in the same cluster.
- If you mount indices from snapshots held in a repository to which a different cluster has write access then you must make sure that the other cluster does not delete these snapshots.
- If you delete a snapshot while it is mounted as a searchable snapshot then the data is lost. Similarly, if the repository fails or corrupts the contents of the snapshot then the data is lost.
- Although Elasticsearch may have cached the data onto local storage, these caches may be incomplete and cannot be used to recover any data after a repository failure. You must make sure that your repository is reliable and protects against corruption of your data while it is at rest in the repository.
The blob storage offered by all major public cloud providers typically offers very good protection against data loss or corruption. If you manage your own repository storage then you are responsible for its reliability.
On this page