Fix other role nodes out of disk
editFix other role nodes out of disk
editElasticsearch can use dedicated nodes to execute other functions apart from storing data or coordinating the cluster, for example machine learning. If one or more of these nodes are running out of space, you need to ensure that they have enough disk space to function. If the health API reports that a node that is not a master and does not contain data is out of space you need to increase the disk capacity of this node.
- Log in to the Elastic Cloud console.
-
On the Elasticsearch Service panel, click the gear under the
Manage deployment
column that corresponds to the name of your deployment. -
Go to
Actions > Edit deployment
and then go to theCoordinating instances
or theMachine Learning instances
section depending on the roles listed in the diagnosis: -
Choose a larger than the pre-selected capacity configuration from the drop-down menu and click
save
. Wait for the plan to be applied and the problem should be resolved.
In order to increase the disk capacity of any other node, you will need to replace the instance that has run out of space with one of higher disk capacity.
-
First, retrieve the disk threshold that will indicate how much disk space is needed. The relevant threshold is the high watermark and can be retrieved via the following command:
GET _cluster/settings?include_defaults&filter_path=*.cluster.routing.allocation.disk.watermark.high*
The response will look like this:
{ "defaults": { "cluster": { "routing": { "allocation": { "disk": { "watermark": { "high": "90%", "high.max_headroom": "150GB" } } } } } }
The above means that in order to resolve the disk shortage we need to either drop our disk usage below the 90% or have more than 150GB available, read more how this threshold works here.
-
The next step is to find out the current disk usage, this will allow to calculate how much extra space is needed. In the following example, we show only a machine learning node for readability purposes:
GET /_cat/nodes?v&h=name,node.role,disk.used_percent,disk.used,disk.avail,disk.total
The response will look like this:
name node.role disk.used_percent disk.used disk.avail disk.total instance-0000000000 l 85.31 3.4gb 500mb 4gb
- The desired situation is to drop the disk usage below the relevant threshold, in our example 90%. Consider adding some padding, so it will not go over the threshold soon. Assuming you have the new node ready, add this node to the cluster.
-
Verify that the new node has joined the cluster:
GET /_cat/nodes?v&h=name,node.role,disk.used_percent,disk.used,disk.avail,disk.total
The response will look like this:
name node.role disk.used_percent disk.used disk.avail disk.total instance-0000000000 l 85.31 3.4gb 500mb 4gb instance-0000000001 l 41.31 3.4gb 4.5gb 8gb
- Now you can remove the out of disk space instance.