WARNING: Version 1.6 of Elasticsearch has passed its EOL date.
This documentation is no longer being maintained and may be removed. If you are running this version, we strongly advise you to upgrade. For the latest information, see the current release documentation.
Index Shard Allocation
editIndex Shard Allocation
editShard Allocation Filtering
editAllows to control the allocation of indices on nodes based on include/exclude filters. The filters can be set both on the index level and on the cluster level. Lets start with an example of setting it on the cluster level:
Lets say we have 4 nodes, each has specific attribute called tag
associated with it (the name of the attribute can be any name). Each
node has a specific value associated with tag
. Node 1 has a setting
node.tag: value1
, Node 2 a setting of node.tag: value2
, and so on.
We can create an index that will only deploy on nodes that have tag
set to value1
and value2
by setting
index.routing.allocation.include.tag
to value1,value2
. For example:
curl -XPUT localhost:9200/test/_settings -d '{ "index.routing.allocation.include.tag" : "value1,value2" }'
On the other hand, we can create an index that will be deployed on all
nodes except for nodes with a tag
of value value3
by setting
index.routing.allocation.exclude.tag
to value3
. For example:
curl -XPUT localhost:9200/test/_settings -d '{ "index.routing.allocation.exclude.tag" : "value3" }'
index.routing.allocation.require.*
can be used to
specify a number of rules, all of which MUST match in order for a shard
to be allocated to a node. This is in contrast to include
which will
include a node if ANY rule matches.
The include
, exclude
and require
values can have generic simple
matching wildcards, for example, value1*
. Additionally, special attribute
names called _ip
, _name
, _id
and _host
can be used to match by node
ip address, name, id or host name, respectively.
Obviously a node can have several attributes associated with it, and both the attribute name and value are controlled in the setting. For example, here is a sample of several node configurations:
node.group1: group1_value1 node.group2: group2_value4
In the same manner, include
, exclude
and require
can work against
several attributes, for example:
curl -XPUT localhost:9200/test/_settings -d '{ "index.routing.allocation.include.group1" : "xxx" "index.routing.allocation.include.group2" : "yyy", "index.routing.allocation.exclude.group3" : "zzz", "index.routing.allocation.require.group4" : "aaa", }'
The provided settings can also be updated in real time using the update settings API, allowing to "move" indices (shards) around in realtime.
Cluster wide filtering can also be defined, and be updated in real time
using the cluster update settings API. This setting can come in handy
for things like decommissioning nodes (even if the replica count is set
to 0). Here is a sample of how to decommission a node based on _ip
address:
curl -XPUT localhost:9200/_cluster/settings -d '{ "transient" : { "cluster.routing.allocation.exclude._ip" : "10.0.0.1" } }'
Total Shards Per Node
editThe index.routing.allocation.total_shards_per_node
setting allows to
control how many total shards (replicas and primaries) for an index will be allocated per node.
It can be dynamically set on a live index using the update index
settings API.
Disk-based Shard Allocation
editElasticsearch can be configured to prevent shard allocation on nodes depending on disk usage for the node. This functionality is enabled by default, and can be changed either in the configuration file, or dynamically using:
curl -XPUT localhost:9200/_cluster/settings -d '{ "transient" : { "cluster.routing.allocation.disk.threshold_enabled" : false } }'
Once enabled, Elasticsearch uses two watermarks to decide whether shards should be allocated or can remain on the node.
cluster.routing.allocation.disk.watermark.low
controls the low
watermark for disk usage. It defaults to 85%, meaning ES will not
allocate new shards to nodes once they have more than 85% disk
used. It can also be set to an absolute byte value (like 500mb) to
prevent ES from allocating shards if less than the configured amount
of space is available.
cluster.routing.allocation.disk.watermark.high
controls the high
watermark. It defaults to 90%, meaning ES will attempt to relocate
shards to another node if the node disk usage rises above 90%. It can
also be set to an absolute byte value (similar to the low watermark)
to relocate shards once less than the configured amount of space is
available on the node.
Percentage values refer to used disk space, while byte values refer to free disk space. This can be confusing, since it flips the meaning of high and low. For example, it makes sense to set the low watermark to 10gb and the high watermark to 5gb, but not the other way around.
Both watermark settings can be changed dynamically using the cluster
settings API. By default, Elasticsearch will retrieve information
about the disk usage of the nodes every 30 seconds. This can also be
changed by setting the cluster.info.update.interval
setting.
An example of updating the low watermark to no more than 80% of the disk size, a high watermark of at least 50 gigabytes free, and updating the information about the cluster every minute:
curl -XPUT localhost:9200/_cluster/settings -d '{ "transient" : { "cluster.routing.allocation.disk.watermark.low" : "80%", "cluster.routing.allocation.disk.watermark.high" : "50gb", "cluster.info.update.interval" : "1m" } }'
By default, Elasticsearch will take into account shards that are currently being
relocated to the target node when computing a node’s disk usage. This can be
changed by setting the cluster.routing.allocation.disk.include_relocations
setting to false
(defaults to true
). Taking relocating shards' sizes into
account may, however, mean that the disk usage for a node is incorrectly
estimated on the high side, since the relocation could be 90% complete and a
recently retrieved disk usage would include the total size of the relocating
shard as well as the space already used by the running relocation.