Discovery

edit

Discovery is the process by which the cluster formation module finds other nodes with which to form a cluster. This process runs when you start an Elasticsearch node or when a node believes the master node failed and continues until the master node is found or a new master node is elected.

This process starts with a list of seed addresses from one or more seed hosts providers, together with the addresses of any master-eligible nodes that were in the last-known cluster. The process operates in two phases: First, each node probes the seed addresses by connecting to each address and attempting to identify the node to which it is connected and to verify that it is master-eligible. Secondly, if successful, it shares with the remote node a list of all of its known master-eligible peers and the remote node responds with its peers in turn. The node then probes all the new nodes that it just discovered, requests their peers, and so on.

If the node is not master-eligible then it continues this discovery process until it has discovered an elected master node. If no elected master is discovered then the node will retry after discovery.find_peers_interval which defaults to 1s.

If the node is master-eligible then it continues this discovery process until it has either discovered an elected master node or else it has discovered enough masterless master-eligible nodes to complete an election. If neither of these occur quickly enough then the node will retry after discovery.find_peers_interval which defaults to 1s.

Once a master is elected, it will normally remain as the elected master until it is deliberately stopped. It may also stop acting as the master if fault detection determines the cluster to be faulty. When a node stops being the elected master, it begins the discovery process again.

Troubleshooting discovery

edit

In most cases, the discovery and election process completes quickly, and the master node remains elected for a long period of time.

If your cluster doesn’t have a stable master, many of its features won’t work correctly and Elasticsearch will report errors to clients and in its logs. You must fix the master node’s instability before addressing these other issues. It will not be possible to solve any other issues while there is no elected master node or the elected master node is unstable.

If your cluster has a stable master but some nodes can’t discover or join it, these nodes will report errors to clients and in their logs. You must address the obstacles preventing these nodes from joining the cluster before addressing other issues. It will not be possible to solve any other issues reported by these nodes while they are unable to join the cluster.

If the cluster has no elected master node for more than a few seconds, the master is unstable, or some nodes are unable to discover or join a stable master, then Elasticsearch will record information in its logs explaining why. If the problems persist for more than a few minutes, Elasticsearch will record additional information in its logs. To properly troubleshoot discovery and election problems, collect and analyse logs covering at least five minutes from all nodes.

The following sections describe some common discovery and election problems.

No master is elected

edit

When a node wins the master election, it logs a message containing elected-as-master and all nodes log a message containing master node changed identifying the new elected master node.

If there is no elected master node and no node can win an election, all nodes will repeatedly log messages about the problem using a logger called org.elasticsearch.cluster.coordination.ClusterFormationFailureHelper. By default, this happens every 10 seconds.

Master elections only involve master-eligible nodes, so focus on the logs from master-eligible nodes in this situation. These nodes' logs will indicate the requirements for a master election, such as the discovery of a certain set of nodes.

If the logs indicate that Elasticsearch can’t discover enough nodes to form a quorum, you must address the reasons preventing Elasticsearch from discovering the missing nodes. The missing nodes are needed to reconstruct the cluster metadata. Without the cluster metadata, the data in your cluster is meaningless. The cluster metadata is stored on a subset of the master-eligible nodes in the cluster. If a quorum can’t be discovered, the missing nodes were the ones holding the cluster metadata.

Ensure there are enough nodes running to form a quorum and that every node can communicate with every other node over the network. Elasticsearch will report additional details about network connectivity if the election problems persist for more than a few minutes. If you can’t start enough nodes to form a quorum, start a new cluster and restore data from a recent snapshot. Refer to Quorum-based decision making for more information.

If the logs indicate that Elasticsearch has discovered a possible quorum of nodes, the typical reason that the cluster can’t elect a master is that one of the other nodes can’t discover a quorum. Inspect the logs on the other master-eligible nodes and ensure that they have all discovered enough nodes to form a quorum.

Master is elected but unstable

edit

When a node wins the master election, it logs a message containing elected-as-master. If this happens repeatedly, the elected master node is unstable. In this situation, focus on the logs from the master-eligible nodes to understand why the election winner stops being the master and triggers another election.

Node cannot discover or join stable master

edit

If there is a stable elected master but a node can’t discover or join its cluster, it will repeatedly log messages about the problem using the ClusterFormationFailureHelper logger. Other log messages on the affected node and the elected master may provide additional information about the problem.

Node joins cluster and leaves again

edit

If a node joins the cluster but Elasticsearch determines it to be faulty then it will be removed from the cluster again. See Troubleshooting an unstable cluster for more information.

Seed hosts providers

edit

By default the cluster formation module offers two seed hosts providers to configure the list of seed nodes: a settings-based and a file-based seed hosts provider. It can be extended to support cloud environments and other forms of seed hosts providers via discovery plugins. Seed hosts providers are configured using the discovery.seed_providers setting, which defaults to the settings-based hosts provider. This setting accepts a list of different providers, allowing you to make use of multiple ways to find the seed hosts for your cluster.

Each seed hosts provider yields the IP addresses or hostnames of the seed nodes. If it returns any hostnames then these are resolved to IP addresses using a DNS lookup. If a hostname resolves to multiple IP addresses then Elasticsearch tries to find a seed node at all of these addresses. If the hosts provider does not explicitly give the TCP port of the node by then, it will implicitly use the first port in the port range given by transport.profiles.default.port, or by transport.port if transport.profiles.default.port is not set. The number of concurrent lookups is controlled by discovery.seed_resolver.max_concurrent_resolvers which defaults to 10, and the timeout for each lookup is controlled by discovery.seed_resolver.timeout which defaults to 5s. Note that DNS lookups are subject to JVM DNS caching.

Settings-based seed hosts provider
edit

The settings-based seed hosts provider uses a node setting to configure a static list of the addresses of the seed nodes. These addresses can be given as hostnames or IP addresses; hosts specified as hostnames are resolved to IP addresses during each round of discovery.

The list of hosts is set using the discovery.seed_hosts static setting. For example:

discovery.seed_hosts:
   - 192.168.1.10:9300
   - 192.168.1.11 
   - seeds.mydomain.com 

The port will default to transport.profiles.default.port and fallback to transport.port if not specified.

If a hostname resolves to multiple IP addresses, Elasticsearch will attempt to connect to every resolved address.

File-based seed hosts provider
edit

The file-based seed hosts provider configures a list of hosts via an external file. Elasticsearch reloads this file when it changes, so that the list of seed nodes can change dynamically without needing to restart each node. For example, this gives a convenient mechanism for an Elasticsearch instance that is run in a Docker container to be dynamically supplied with a list of IP addresses to connect to when those IP addresses may not be known at node startup.

To enable file-based discovery, configure the file hosts provider as follows in the elasticsearch.yml file:

discovery.seed_providers: file

Then create a file at $ES_PATH_CONF/unicast_hosts.txt in the format described below. Any time a change is made to the unicast_hosts.txt file the new changes will be picked up by Elasticsearch and the new hosts list will be used.

Note that the file-based discovery plugin augments the unicast hosts list in elasticsearch.yml: if there are valid seed addresses in discovery.seed_hosts then Elasticsearch uses those addresses in addition to those supplied in unicast_hosts.txt.

The unicast_hosts.txt file contains one node entry per line. Each node entry consists of the host (host name or IP address) and an optional transport port number. If the port number is specified, is must come immediately after the host (on the same line) separated by a :. If the port number is not specified, Elasticsearch will implicitly use the first port in the port range given by transport.profiles.default.port, or by transport.port if transport.profiles.default.port is not set.

For example, this is an example of unicast_hosts.txt for a cluster with four nodes that participate in discovery, some of which are not running on the default port:

10.10.10.5
10.10.10.6:9305
10.10.10.5:10005
# an IPv6 address
[2001:0db8:85a3:0000:0000:8a2e:0370:7334]:9301

Host names are allowed instead of IP addresses and are resolved by DNS as described above. IPv6 addresses must be given in brackets with the port, if needed, coming after the brackets.

You can also add comments to this file. All comments must appear on their lines starting with # (i.e. comments cannot start in the middle of a line).

EC2 hosts provider
edit

The EC2 discovery plugin adds a hosts provider that uses the AWS API to find a list of seed nodes.

Azure Classic hosts provider
edit

The Azure Classic discovery plugin adds a hosts provider that uses the Azure Classic API find a list of seed nodes.

Google Compute Engine hosts provider
edit

The GCE discovery plugin adds a hosts provider that uses the GCE API find a list of seed nodes.