This documentation contains work-in-progress information for future Elastic Stack and Cloud releases. Use the version selector to view supported release docs. It also contains some Elastic Cloud serverless information. Check out our serverless docs for more details.

« Decrease the disk usage of data nodes Fix other role nodes out of disk »

› ›

Fix master nodes out of disk

edit

Fix master nodes out of disk

edit

Elasticsearch is using master nodes to coordinate the cluster. If the master or any master eligible nodes are running out of space, you need to ensure that they have enough disk space to function. If the health API reports that your master node is out of space you need to increase the disk capacity of your master nodes.

Log in to the Elastic Cloud console.
On the Elasticsearch Service panel, click the gear under the Manage deployment column that corresponds to the name of your deployment.
Go to Actions > Edit deployment and then go to the Master instances section:
Choose a larger than the pre-selected capacity configuration from the drop-down menu and click save. Wait for the plan to be applied and the problem should be resolved.

In order to increase the disk capacity of a master node, you will need to replace all the master nodes with master nodes of higher disk capacity.

First, retrieve the disk threshold that will indicate how much disk space is needed. The relevant threshold is the high watermark and can be retrieved via the following command:

resp = client.cluster.get_settings(
    include_defaults=True,
    filter_path="*.cluster.routing.allocation.disk.watermark.high*",
)
print(resp)

response = client.cluster.get_settings(
  include_defaults: true,
  filter_path: '*.cluster.routing.allocation.disk.watermark.high*'
)
puts response

const response = await client.cluster.getSettings({
  include_defaults: "true",
  filter_path: "*.cluster.routing.allocation.disk.watermark.high*",
});
console.log(response);

GET _cluster/settings?include_defaults&filter_path=*.cluster.routing.allocation.disk.watermark.high*

The response will look like this:

{
  "defaults": {
    "cluster": {
      "routing": {
        "allocation": {
          "disk": {
            "watermark": {
              "high": "90%",
              "high.max_headroom": "150GB"
            }
          }
        }
      }
    }
  }

The above means that in order to resolve the disk shortage we need to either drop our disk usage below the 90% or have more than 150GB available, read more how this threshold works here.

The next step is to find out the current disk usage, this will allow to calculate how much extra space is needed. In the following example, we show only the master nodes for readability purposes:

resp = client.cat.nodes(
    v=True,
    h="name,master,node.role,disk.used_percent,disk.used,disk.avail,disk.total",
)
print(resp)

response = client.cat.nodes(
  v: true,
  h: 'name,master,node.role,disk.used_percent,disk.used,disk.avail,disk.total'
)
puts response

const response = await client.cat.nodes({
  v: "true",
  h: "name,master,node.role,disk.used_percent,disk.used,disk.avail,disk.total",
});
console.log(response);

GET /_cat/nodes?v&h=name,master,node.role,disk.used_percent,disk.used,disk.avail,disk.total

The response will look like this:

name                master node.role disk.used_percent disk.used disk.avail disk.total
instance-0000000000 *      m                    85.31    3.4gb     500mb       4gb
instance-0000000001 *      m                    50.02    2.1gb     1.9gb       4gb
instance-0000000002 *      m                    50.02    1.9gb     2.1gb       4gb

The desired situation is to drop the disk usages below the relevant threshold, in our example 90%. Consider adding some padding, so it will not go over the threshold soon. If you have multiple master nodes you need to ensure that all master nodes will have this capacity. Assuming you have the new nodes ready, follow the next three steps for every master node.
Bring down one of the master nodes.

Start up one of the new master nodes and wait for it to join the cluster. You can check this via:

resp = client.cat.nodes(
    v=True,
    h="name,master,node.role,disk.used_percent,disk.used,disk.avail,disk.total",
)
print(resp)

response = client.cat.nodes(
  v: true,
  h: 'name,master,node.role,disk.used_percent,disk.used,disk.avail,disk.total'
)
puts response

const response = await client.cat.nodes({
  v: "true",
  h: "name,master,node.role,disk.used_percent,disk.used,disk.avail,disk.total",
});
console.log(response);

GET /_cat/nodes?v&h=name,master,node.role,disk.used_percent,disk.used,disk.avail,disk.total

Only after you have confirmed that your cluster has the initial number of master nodes, move forward to the next one until all the initial master nodes have been replaced.

« Decrease the disk usage of data nodes Fix other role nodes out of disk »