- Elasticsearch Guide: other versions:
- What is Elasticsearch?
- What’s new in 7.11
- Getting started with Elasticsearch
- Set up Elasticsearch
- Installing Elasticsearch
- Configuring Elasticsearch
- Setting JVM options
- Secure settings
- Auditing settings
- Circuit breaker settings
- Cluster-level shard allocation and routing settings
- Cross-cluster replication settings
- Discovery and cluster formation settings
- Field data cache settings
- HTTP
- Index lifecycle management settings
- Index management settings
- Index recovery settings
- Indexing buffer settings
- License settings
- Local gateway settings
- Logging
- Machine learning settings
- Monitoring settings
- Node
- Network settings
- Node query cache settings
- Search settings
- Security settings
- Shard request cache settings
- Snapshot lifecycle management settings
- Transforms settings
- Transport
- Thread pools
- Watcher settings
- Advanced configuration settings
- Important Elasticsearch configuration
- Important System Configuration
- Bootstrap Checks
- Heap size check
- File descriptor check
- Memory lock check
- Maximum number of threads check
- Max file size check
- Maximum size virtual memory check
- Maximum map count check
- Client JVM check
- Use serial collector check
- System call filter check
- OnError and OnOutOfMemoryError checks
- Early-access check
- G1GC check
- All permission check
- Discovery configuration check
- Bootstrap Checks for X-Pack
- Starting Elasticsearch
- Stopping Elasticsearch
- Discovery and cluster formation
- Add and remove nodes in your cluster
- Full-cluster restart and rolling restart
- Remote clusters
- Set up X-Pack
- Configuring X-Pack Java Clients
- Plugins
- Upgrade Elasticsearch
- Index modules
- Mapping
- Text analysis
- Overview
- Concepts
- Configure text analysis
- Built-in analyzer reference
- Tokenizer reference
- Token filter reference
- Apostrophe
- ASCII folding
- CJK bigram
- CJK width
- Classic
- Common grams
- Conditional
- Decimal digit
- Delimited payload
- Dictionary decompounder
- Edge n-gram
- Elision
- Fingerprint
- Flatten graph
- Hunspell
- Hyphenation decompounder
- Keep types
- Keep words
- Keyword marker
- Keyword repeat
- KStem
- Length
- Limit token count
- Lowercase
- MinHash
- Multiplexer
- N-gram
- Normalization
- Pattern capture
- Pattern replace
- Phonetic
- Porter stem
- Predicate script
- Remove duplicates
- Reverse
- Shingle
- Snowball
- Stemmer
- Stemmer override
- Stop
- Synonym
- Synonym graph
- Trim
- Truncate
- Unique
- Uppercase
- Word delimiter
- Word delimiter graph
- Character filters reference
- Normalizers
- Index templates
- Data streams
- Ingest pipelines
- Search your data
- Query DSL
- Aggregations
- Bucket aggregations
- Adjacency matrix
- Auto-interval date histogram
- Children
- Composite
- Date histogram
- Date range
- Diversified sampler
- Filter
- Filters
- Geo-distance
- Geohash grid
- Geotile grid
- Global
- Histogram
- IP range
- Missing
- Nested
- Parent
- Range
- Rare terms
- Reverse nested
- Sampler
- Significant terms
- Significant text
- Terms
- Variable width histogram
- Subtleties of bucketing range fields
- Metrics aggregations
- Pipeline aggregations
- Bucket aggregations
- EQL
- SQL access
- Overview
- Getting Started with SQL
- Conventions and Terminology
- Security
- SQL REST API
- SQL Translate API
- SQL CLI
- SQL JDBC
- SQL ODBC
- SQL Client Applications
- SQL Language
- Functions and Operators
- Comparison Operators
- Logical Operators
- Math Operators
- Cast Operators
- LIKE and RLIKE Operators
- Aggregate Functions
- Grouping Functions
- Date/Time and Interval Functions and Operators
- Full-Text Search Functions
- Mathematical Functions
- String Functions
- Type Conversion Functions
- Geo Functions
- Conditional Functions And Expressions
- System Functions
- Reserved keywords
- SQL Limitations
- Scripting
- Data management
- ILM: Manage the index lifecycle
- Overview
- Concepts
- Automate rollover
- Customize built-in ILM policies
- Index lifecycle actions
- Configure a lifecycle policy
- Migrate index allocation filters to node roles
- Resolve lifecycle policy execution errors
- Start and stop index lifecycle management
- Manage existing indices
- Skip rollover
- Restore a managed data stream or index
- Autoscaling
- Monitor a cluster
- Frozen indices
- Roll up or transform your data
- Set up a cluster for high availability
- Snapshot and restore
- Secure a cluster
- Overview
- Configuring security
- User authentication
- Built-in users
- Internal users
- Token-based authentication services
- Realms
- Realm chains
- Active Directory user authentication
- File-based user authentication
- LDAP user authentication
- Native user authentication
- OpenID Connect authentication
- PKI user authentication
- SAML authentication
- Kerberos authentication
- Integrating with other authentication systems
- Enabling anonymous access
- Controlling the user cache
- Configuring SAML single-sign-on on the Elastic Stack
- Configuring single sign-on to the Elastic Stack using OpenID Connect
- User authorization
- Built-in roles
- Defining roles
- Granting access to Stack Management features
- Security privileges
- Document level security
- Field level security
- Granting privileges for data streams and index aliases
- Mapping users and groups to roles
- Setting up field and document level security
- Submitting requests on behalf of other users
- Configuring authorization delegation
- Customizing roles and authorization
- Enabling audit logging
- Encrypting communications
- Restricting connections with IP filtering
- Cross cluster search, clients, and integrations
- Tutorial: Getting started with security
- Tutorial: Encrypting communications
- Troubleshooting
- Some settings are not returned via the nodes settings API
- Authorization exceptions
- Users command fails due to extra arguments
- Users are frequently locked out of Active Directory
- Certificate verification fails for curl on Mac
- SSLHandshakeException causes connections to fail
- Common SSL/TLS exceptions
- Common Kerberos exceptions
- Common SAML issues
- Internal Server Error in Kibana
- Setup-passwords command fails due to connection failure
- Failures due to relocation of the configuration files
- Limitations
- Watch for cluster and index events
- Command line tools
- How To
- Glossary of terms
- REST APIs
- API conventions
- Autoscaling APIs
- Compact and aligned text (CAT) APIs
- cat aliases
- cat allocation
- cat anomaly detectors
- cat count
- cat data frame analytics
- cat datafeeds
- cat fielddata
- cat health
- cat indices
- cat master
- cat nodeattrs
- cat nodes
- cat pending tasks
- cat plugins
- cat recovery
- cat repositories
- cat segments
- cat shards
- cat snapshots
- cat task management
- cat templates
- cat thread pool
- cat trained model
- cat transforms
- Cluster APIs
- Cluster allocation explain
- Cluster get settings
- Cluster health
- Cluster reroute
- Cluster state
- Cluster stats
- Cluster update settings
- Nodes feature usage
- Nodes hot threads
- Nodes info
- Nodes reload secure settings
- Nodes stats
- Pending cluster tasks
- Remote cluster info
- Task management
- Voting configuration exclusions
- Cross-cluster replication APIs
- Data stream APIs
- Document APIs
- Enrich APIs
- Graph explore API
- Index APIs
- Analyze
- Bulk index alias
- Clear cache
- Clone index
- Close index
- Create index
- Create or update component template
- Create or update index alias
- Create or update index template
- Create or update index template (legacy)
- Delete component template
- Delete dangling index
- Delete index
- Delete index alias
- Delete index template
- Delete index template (legacy)
- Flush
- Force merge
- Freeze index
- Get component template
- Get field mapping
- Get index
- Get index alias
- Get index settings
- Get index template
- Get index template (legacy)
- Get mapping
- Import dangling index
- Index alias exists
- Index exists
- Index recovery
- Index segments
- Index shard stores
- Index stats
- Index template exists (legacy)
- List dangling indices
- Open index
- Refresh
- Resolve index
- Rollover index
- Shrink index
- Simulate index
- Simulate template
- Split index
- Synced flush
- Type exists
- Unfreeze index
- Update index settings
- Update mapping
- Index lifecycle management APIs
- Ingest APIs
- Info API
- Licensing APIs
- Machine learning anomaly detection APIs
- Add events to calendar
- Add jobs to calendar
- Close jobs
- Create jobs
- Create calendars
- Create datafeeds
- Create filters
- Delete calendars
- Delete datafeeds
- Delete events from calendar
- Delete filters
- Delete forecasts
- Delete jobs
- Delete jobs from calendar
- Delete model snapshots
- Delete expired data
- Estimate model memory
- Find file structure
- Flush jobs
- Forecast jobs
- Get buckets
- Get calendars
- Get categories
- Get datafeeds
- Get datafeed statistics
- Get influencers
- Get jobs
- Get job statistics
- Get machine learning info
- Get model snapshots
- Get overall buckets
- Get scheduled events
- Get filters
- Get records
- Open jobs
- Post data to jobs
- Preview datafeeds
- Revert model snapshots
- Set upgrade mode
- Start datafeeds
- Stop datafeeds
- Update datafeeds
- Update filters
- Update jobs
- Update model snapshots
- Upgrade model snapshots
- Machine learning data frame analytics APIs
- Create data frame analytics jobs
- Create trained models
- Update data frame analytics jobs
- Delete data frame analytics jobs
- Delete trained models
- Evaluate data frame analytics
- Explain data frame analytics
- Get data frame analytics jobs
- Get data frame analytics jobs stats
- Get trained models
- Get trained models stats
- Start data frame analytics jobs
- Stop data frame analytics jobs
- Migration APIs
- Reload search analyzers API
- Repositories metering APIs
- Rollup APIs
- Search APIs
- Searchable snapshots APIs
- Security APIs
- Authenticate
- Change passwords
- Clear cache
- Clear roles cache
- Clear privileges cache
- Clear API key cache
- Create API keys
- Create or update application privileges
- Create or update role mappings
- Create or update roles
- Create or update users
- Delegate PKI authentication
- Delete application privileges
- Delete role mappings
- Delete roles
- Delete users
- Disable users
- Enable users
- Get API key information
- Get application privileges
- Get builtin privileges
- Get role mappings
- Get roles
- Get token
- Get users
- Grant API keys
- Has privileges
- Invalidate API key
- Invalidate token
- OpenID Connect prepare authentication
- OpenID Connect authenticate
- OpenID Connect logout
- SAML prepare authentication
- SAML authenticate
- SAML logout
- SAML invalidate
- SAML service provider metadata
- SSL certificate
- Snapshot and restore APIs
- Snapshot lifecycle management APIs
- Transform APIs
- Usage API
- Watcher APIs
- Definitions
- Migration guide
- Release notes
- Elasticsearch version 7.11.2
- Elasticsearch version 7.11.1
- Elasticsearch version 7.11.0
- Elasticsearch version 7.10.2
- Elasticsearch version 7.10.1
- Elasticsearch version 7.10.0
- Elasticsearch version 7.9.3
- Elasticsearch version 7.9.2
- Elasticsearch version 7.9.1
- Elasticsearch version 7.9.0
- Elasticsearch version 7.8.1
- Elasticsearch version 7.8.0
- Elasticsearch version 7.7.1
- Elasticsearch version 7.7.0
- Elasticsearch version 7.6.2
- Elasticsearch version 7.6.1
- Elasticsearch version 7.6.0
- Elasticsearch version 7.5.2
- Elasticsearch version 7.5.1
- Elasticsearch version 7.5.0
- Elasticsearch version 7.4.2
- Elasticsearch version 7.4.1
- Elasticsearch version 7.4.0
- Elasticsearch version 7.3.2
- Elasticsearch version 7.3.1
- Elasticsearch version 7.3.0
- Elasticsearch version 7.2.1
- Elasticsearch version 7.2.0
- Elasticsearch version 7.1.1
- Elasticsearch version 7.1.0
- Elasticsearch version 7.0.0
- Elasticsearch version 7.0.0-rc2
- Elasticsearch version 7.0.0-rc1
- Elasticsearch version 7.0.0-beta1
- Elasticsearch version 7.0.0-alpha2
- Elasticsearch version 7.0.0-alpha1
- Dependencies and versions
Resilience in larger clusters
editResilience in larger clusters
editIt is not unusual for nodes to share some common infrastructure, such as a power supply or network router. If so, you should plan for the failure of this infrastructure and ensure that such a failure would not affect too many of your nodes. It is common practice to group all the nodes sharing some infrastructure into zones and to plan for the failure of any whole zone at once.
Your cluster’s zones should all be contained within a single data centre. Elasticsearch expects its node-to-node connections to be reliable and have low latency and high bandwidth. Connections between data centres typically do not meet these expectations. Although Elasticsearch will behave correctly on an unreliable or slow network, it will not necessarily behave optimally. It may take a considerable length of time for a cluster to fully recover from a network partition since it must resynchronize any missing data and rebalance the cluster once the partition heals. If you want your data to be available in multiple data centres, deploy a separate cluster in each data centre and use cross-cluster search or cross-cluster replication to link the clusters together. These features are designed to perform well even if the cluster-to-cluster connections are less reliable or slower than the network within each cluster.
After losing a whole zone’s worth of nodes, a properly-designed cluster may be functional but running with significantly reduced capacity. You may need to provision extra nodes to restore acceptable performance in your cluster when handling such a failure.
For resilience against whole-zone failures, it is important that there is a copy of each shard in more than one zone, which can be achieved by placing data nodes in multiple zones and configuring shard allocation awareness. You should also ensure that client requests are sent to nodes in more than one zone.
You should consider all node roles and ensure that each role is split redundantly across two or more zones. For instance, if you are using ingest pipelines or machine learning, you should have ingest or machine learning nodes in two or more zones. However, the placement of master-eligible nodes requires a little more care because a resilient cluster needs at least two of the three master-eligible nodes in order to function. The following sections explore the options for placing master-eligible nodes across multiple zones.
Two-zone clusters
editIf you have two zones, you should have a different number of master-eligible nodes in each zone so that the zone with more nodes will contain a majority of them and will be able to survive the loss of the other zone. For instance, if you have three master-eligible nodes then you may put all of them in one zone or you may put two in one zone and the third in the other zone. You should not place an equal number of master-eligible nodes in each zone. If you place the same number of master-eligible nodes in each zone, neither zone has a majority of its own. Therefore, the cluster may not survive the loss of either zone.
Two-zone clusters with a tiebreaker
editThe two-zone deployment described above is tolerant to the loss of one of its zones but not to the loss of the other one because master elections are majority-based. You cannot configure a two-zone cluster so that it can tolerate the loss of either zone because this is theoretically impossible. You might expect that if either zone fails then Elasticsearch can elect a node from the remaining zone as the master but it is impossible to tell the difference between the failure of a remote zone and a mere loss of connectivity between the zones. If both zones were capable of running independent elections then a loss of connectivity would lead to a split-brain problem and therefore data loss. Elasticsearch avoids this and protects your data by not electing a node from either zone as master until that node can be sure that it has the latest cluster state and that there is no other master in the cluster. This may mean there is no master at all until connectivity is restored.
You can solve this by placing one master-eligible node in each of your two zones and adding a single extra master-eligible node in an independent third zone. The extra master-eligible node acts as a tiebreaker in cases where the two original zones are disconnected from each other. The extra tiebreaker node should be a dedicated voting-only master-eligible node, also known as a dedicated tiebreaker. A dedicated tiebreaker need not be as powerful as the other two nodes since it has no other roles and will not perform any searches nor coordinate any client requests nor be elected as the master of the cluster.
You should use shard allocation awareness to ensure that there is a copy of each shard in each zone. This means either zone remains fully available if the other zone fails.
All master-eligible nodes, including voting-only nodes, are on the critical path for publishing cluster state updates. Because of this, these nodes require reasonably fast persistent storage and a reliable, low-latency network connection to the rest of the cluster. If you add a tiebreaker node in a third independent zone then you must make sure it has adequate resources and good connectivity to the rest of the cluster.
Clusters with three or more zones
editIf you have three zones then you should have one master-eligible node in each zone. If you have more than three zones then you should choose three of the zones and put a master-eligible node in each of these three zones. This will mean that the cluster can still elect a master even if one of the zones fails.
As always, your indices should have at least one replica in case a node fails, unless they are searchable snapshot indices. You should also use shard allocation awareness to limit the number of copies of each shard in each zone. For instance, if you have an index with one or two replicas configured then allocation awareness will ensure that the replicas of the shard are in a different zone from the primary. This means that a copy of every shard will still be available if one zone fails. The availability of this shard will not be affected by such a failure.
Summary
editThe cluster will be resilient to the loss of any zone as long as:
-
The cluster health status is
green
. - There are at least two zones containing data nodes.
- Every index that is not a searchable snapshot index has at least one replica of each shard, in addition to the primary.
- Shard allocation awareness is configured to avoid concentrating all copies of a shard within a single zone.
- The cluster has at least three master-eligible nodes. At least two of these nodes are not voting-only master-eligible nodes, and they are spread evenly across at least three zones.
- Clients are configured to send their requests to nodes in more than one zone or are configured to use a load balancer that balances the requests across an appropriate set of nodes. The Elastic Cloud service provides such a load balancer.
On this page