Update strategy

edit

The Elasticsearch cluster configuration can be updated at any time to:

  • Add new nodes
  • Remove some nodes
  • Change Elasticsearch configuration
  • Change Pod resources (memory limits, cpu limit, environment variables, etc.)

On any change, ECK reconciles Kubernetes resources towards the desired cluster definition. Changes occur in a rolling fashion: the state of the cluster is continuously monitored, to allow addition of new nodes and removal of deprecated nodes.

Change budget

edit

No downtime should be expected when the cluster topology changes. Shards on deprecated nodes are migrated away so the node can be safely removed.

For example, to mutate a 3-nodes cluster with 16GB memory limit on each node to a 3-nodes cluster with 32GB memory limit on each node, ECK will:

  1. Add a new 32GB node: the cluster temporarily has 4 nodes
  2. Migrate data away from the first 16GB node
  3. Once data is migrated, remove the first 16GB node
  4. Follow the same steps for the 2 other 16GB nodes

The cluster health stays green during the entire process. By default, only one extra node can be added on top of the expected ones. In the example above, a 3-nodes cluster may temporarily be composed of 4 nodes while data migration is in progress.

This behaviour can be controlled through the changeBudget section of the cluster specification updateStrategy. If not specified, it defaults to the following:

spec:
  updateStrategy:
    changeBudget:
      maxSurge: 1
      maxUnavailable: 0
  • maxSurge specifies the number of Pods that can be added to the cluster, on top of the desired number of nodes in the specification during cluster updates
  • maxUnavailable specifies the number of Pods that can be made unavailable during cluster updates

The default of maxSurge: 1; maxUnavailable: 0 spins up an additional Elasticsearch node during cluster updates. It is possible to speed up cluster topology changes by increasing maxSurge. For example, setting maxSurge: 3 would allow 3 new nodes to be created while the original 3 migrate data in parallel. The cluster would then temporarily have 6 nodes.

Setting maxSurge to 0 and maxUnavailable to a positive value only allows a maximum number of Pods to exist on the Kubernetes cluster. For example, maxSurge: 0; maxUnavailable: 1 would perform the 3 nodes upgrade this way:

  1. Migrate data away from the first 16GB node
  2. Once data is migrated, remove the 16GB node: the cluster temporarily has 2 nodes
  3. Add a new 32GB node: the cluster grows to 3 nodes
  4. Follow the same steps for the 2 other 16GB nodes

Even if a changeBudget is specified, ECK makes sure that some invariants are maintained while a mutation is in progress. In the cluster, there must be at least:

  • One master node alive
  • One data node alive

So under certain circumstances ECK ignores the change budget. For example, a safe migration from a 1-node cluster to another 1-node cluster can only be done by temporarily setting up a 2-nodes cluster.

It is possible to configure the changeBudget to optimize the reuse of persistent volumes, instead of migrating data across nodes. This feature is not supported yet, more details to come in the next release.