Important Elasticsearch configuration
editImportant Elasticsearch configuration
editElasticsearch requires very little configuration to get started, but there are a number of items which must be considered before using your cluster in production:
Our Elastic Cloud service configures these items automatically, making your cluster production-ready by default.
Path settings
editElasticsearch writes the data you index to indices and data streams to a data
directory. Elasticsearch writes its own application logs, which contain information about
cluster health and operations, to a logs
directory.
For macOS .tar.gz
, Linux .tar.gz
, and
Windows .zip
installations, data
and logs
are
subdirectories of $ES_HOME
by default. However, files in $ES_HOME
risk
deletion during an upgrade.
In production, we strongly recommend you set the path.data
and path.logs
in
elasticsearch.yml
to locations outside of $ES_HOME
. Docker,
Debian, and RPM installations write
data and log to locations outside of $ES_HOME
by default.
Supported path.data
and path.logs
values vary by platform:
Linux and macOS installations support Unix-style paths:
path: data: /var/data/elasticsearch logs: /var/log/elasticsearch
Windows installations support DOS paths with escaped backslashes:
path: data: "C:\\Elastic\\Elasticsearch\\data" logs: "C:\\Elastic\\Elasticsearch\\logs"
Don’t modify anything within the data directory or run processes that might interfere with its contents. If something other than Elasticsearch modifies the contents of the data directory, then Elasticsearch may fail, reporting corruption or other data inconsistencies, or may appear to work correctly having silently lost some of your data. Don’t attempt to take filesystem backups of the data directory; there is no supported way to restore such a backup. Instead, use Snapshot and restore to take backups safely. Don’t run virus scanners on the data directory. A virus scanner can prevent Elasticsearch from working correctly and may modify the contents of the data directory. The data directory contains no executables so a virus scan will only find false positives.
Multiple data paths
editDeprecated in 7.13.0.
If needed, you can specify multiple paths in path.data
. Elasticsearch stores the node’s
data across all provided paths but keeps each shard’s data on the same path.
Elasticsearch does not balance shards across a node’s data paths. High disk usage in a single path can trigger a high disk usage watermark for the entire node. If triggered, Elasticsearch will not add shards to the node, even if the node’s other paths have available disk space. If you need additional disk space, we recommend you add a new node rather than additional data paths.
Linux and macOS installations support multiple Unix-style paths in path.data
:
path: data: - /mnt/elasticsearch_1 - /mnt/elasticsearch_2 - /mnt/elasticsearch_3
Windows installations support multiple DOS paths in path.data
:
path: data: - "C:\\Elastic\\Elasticsearch_1" - "E:\\Elastic\\Elasticsearch_1" - "F:\\Elastic\\Elasticsearch_3"
Migrate from multiple data paths
editSupport for multiple data paths was deprecated in 7.13 and will be removed in a future release.
As an alternative to multiple data paths, you can create a filesystem which spans multiple disks with a hardware virtualisation layer such as RAID, or a software virtualisation layer such as Logical Volume Manager (LVM) on Linux or Storage Spaces on Windows. If you wish to use multiple data paths on a single machine then you must run one node for each data path.
If you currently use multiple data paths in a highly available cluster then you can migrate to a setup that uses a single path for each node without downtime using a process similar to a rolling restart: shut each node down in turn and replace it with one or more nodes each configured to use a single data path. In more detail, for each node that currently has multiple data paths you should follow the following process. In principle you can perform this migration during a rolling upgrade to 8.0, but we recommend migrating to a single-data-path setup before starting to upgrade.
- Take a snapshot to protect your data in case of disaster.
-
Optionally, migrate the data away from the target node by using an allocation filter:
response = client.cluster.put_settings( body: { persistent: { 'cluster.routing.allocation.exclude._name' => 'target-node-name' } } ) puts response
PUT _cluster/settings { "persistent": { "cluster.routing.allocation.exclude._name": "target-node-name" } }
You can use the cat allocation API to track progress of this data migration. If some shards do not migrate then the cluster allocation explain API will help you to determine why.
- Follow the steps in the rolling restart process up to and including shutting the target node down.
-
Ensure your cluster health is
yellow
orgreen
, so that there is a copy of every shard assigned to at least one of the other nodes in your cluster. -
If applicable, remove the allocation filter applied in the earlier step.
response = client.cluster.put_settings( body: { persistent: { 'cluster.routing.allocation.exclude._name' => nil } } ) puts response
PUT _cluster/settings { "persistent": { "cluster.routing.allocation.exclude._name": null } }
- Discard the data held by the stopped node by deleting the contents of its data paths.
- Reconfigure your storage. For instance, combine your disks into a single filesystem using LVM or Storage Spaces. Ensure that your reconfigured storage has sufficient space for the data that it will hold.
-
Reconfigure your node by adjusting the
path.data
setting in itselasticsearch.yml
file. If needed, install more nodes each with their ownpath.data
setting pointing at a separate data path. - Start the new nodes and follow the rest of the rolling restart process for them.
-
Ensure your cluster health is
green
, so that every shard has been assigned.
You can alternatively add some number of single-data-path nodes to your cluster, migrate all your data over to these new nodes using allocation filters, and then remove the old nodes from the cluster. This approach will temporarily double the size of your cluster so it will only work if you have the capacity to expand your cluster like this.
If you currently use multiple data paths but your cluster is not highly available then you can migrate to a non-deprecated configuration by taking a snapshot, creating a new cluster with the desired configuration and restoring the snapshot into it.
Cluster name setting
editA node can only join a cluster when it shares its cluster.name
with all the
other nodes in the cluster. The default name is elasticsearch
, but you should
change it to an appropriate name that describes the purpose of the cluster.
cluster.name: logging-prod
Do not reuse the same cluster names in different environments. Otherwise, nodes might join the wrong cluster.
Changing the name of a cluster requires a full cluster restart.
Node name setting
editElasticsearch uses node.name
as a human-readable identifier for a
particular instance of Elasticsearch. This name is included in the response
of many APIs. The node name defaults to the hostname of the machine when
Elasticsearch starts, but can be configured explicitly in
elasticsearch.yml
:
node.name: prod-data-2
Network host setting
editBy default, Elasticsearch only binds to loopback addresses such as 127.0.0.1
and
[::1]
. This is sufficient to run a cluster of one or more nodes on a single
server for development and testing, but a
resilient production cluster must involve
nodes on other servers. There are many network settings but
usually all you need to configure is network.host
:
network.host: 192.168.1.10
When you provide a value for network.host
, Elasticsearch assumes that you
are moving from development mode to production mode, and upgrades a number of
system startup checks from warnings to exceptions. See the differences between
development and production modes.
Discovery and cluster formation settings
editConfigure two important discovery and cluster formation settings before going to production so that nodes in the cluster can discover each other and elect a master node.
discovery.seed_hosts
editOut of the box, without any network configuration, Elasticsearch will bind to
the available loopback addresses and scan local ports 9300
to 9305
to
connect with other nodes running on the same server. This behavior provides an
auto-clustering experience without having to do any configuration.
When you want to form a cluster with nodes on other hosts, use the
static discovery.seed_hosts
setting. This setting
provides a list of other nodes in the cluster
that are master-eligible and likely to be live and contactable to seed
the discovery process. This setting
accepts a YAML sequence or array of the addresses of all the master-eligible
nodes in the cluster. Each address can be either an IP address or a hostname
that resolves to one or more IP addresses via DNS.
discovery.seed_hosts: - 192.168.1.10:9300 - 192.168.1.11 - seeds.mydomain.com - [0:0:0:0:0:ffff:c0a8:10c]:9301
The port is optional and defaults to |
|
If a hostname resolves to multiple IP addresses, the node will attempt to discover other nodes at all resolved addresses. |
|
IPv6 addresses must be enclosed in square brackets. |
If your master-eligible nodes do not have fixed names or addresses, use an alternative hosts provider to find their addresses dynamically.
cluster.initial_master_nodes
editWhen you start an Elasticsearch cluster for the first time, a cluster bootstrapping step determines the set of master-eligible nodes whose votes are counted in the first election. In development mode, with no discovery settings configured, this step is performed automatically by the nodes themselves.
Because auto-bootstrapping is inherently
unsafe, when starting a new cluster in production
mode, you must explicitly list the master-eligible nodes whose votes should be
counted in the very first election. You set this list using the
cluster.initial_master_nodes
setting.
After the cluster forms successfully for the first time, remove the
cluster.initial_master_nodes
setting from each node’s configuration. Do not
use this setting when restarting a cluster or adding a new node to an existing
cluster.
discovery.seed_hosts: - 192.168.1.10:9300 - 192.168.1.11 - seeds.mydomain.com - [0:0:0:0:0:ffff:c0a8:10c]:9301 cluster.initial_master_nodes: - master-node-a - master-node-b - master-node-c
Identify the initial master nodes by their |
See bootstrapping a cluster and discovery and cluster formation settings.
Heap size settings
editBy default, Elasticsearch automatically sets the JVM heap size based on a node’s roles and total memory. We recommend the default sizing for most production environments.
If needed, you can override the default sizing by manually setting the JVM heap size.
JVM heap dump path setting
editBy default, Elasticsearch configures the JVM to dump the heap on out of
memory exceptions to the default data directory. On RPM and
Debian packages, the data directory is /var/lib/elasticsearch
. On
Linux and MacOS and Windows distributions,
the data
directory is located under the root of the Elasticsearch installation.
If this path is not suitable for receiving heap dumps, modify the
-XX:HeapDumpPath=...
entry in jvm.options
:
- If you specify a directory, the JVM will generate a filename for the heap dump based on the PID of the running instance.
- If you specify a fixed filename instead of a directory, the file must not exist when the JVM needs to perform a heap dump on an out of memory exception. Otherwise, the heap dump will fail.
GC logging settings
editBy default, Elasticsearch enables garbage collection (GC) logs. These are configured in
jvm.options
and output to the same default location as
the Elasticsearch logs. The default configuration rotates the logs every 64 MB and
can consume up to 2 GB of disk space.
You can reconfigure JVM logging using the command line options described in
JEP 158: Unified JVM Logging. Unless you
change the default jvm.options
file directly, the Elasticsearch default
configuration is applied in addition to your own settings. To disable the
default configuration, first disable logging by supplying the
-Xlog:disable
option, then supply your own command line options. This
disables all JVM logging, so be sure to review the available options
and enable everything that you require.
To see further options not contained in the original JEP, see Enable Logging with the JVM Unified Logging Framework.
Examples
editChange the default GC log output location to /opt/my-app/gc.log
by
creating $ES_HOME/config/jvm.options.d/gc.options
with some sample
options:
# Turn off all previous logging configuratons -Xlog:disable # Default settings from JEP 158, but with `utctime` instead of `uptime` to match the next line -Xlog:all=warning:stderr:utctime,level,tags # Enable GC logging to a custom location with a variety of options -Xlog:gc*,gc+age=trace,safepoint:file=/opt/my-app/gc.log:utctime,level,pid,tags:filecount=32,filesize=64m
Configure an Elasticsearch Docker container to send GC debug logs to
standard error (stderr
). This lets the container orchestrator
handle the output. If using the ES_JAVA_OPTS
environment variable,
specify:
MY_OPTS="-Xlog:disable -Xlog:all=warning:stderr:utctime,level,tags -Xlog:gc=debug:stderr:utctime" docker run -e ES_JAVA_OPTS="$MY_OPTS" # etc
Temporary directory settings
editBy default, Elasticsearch uses a private temporary directory that the startup script creates immediately below the system temporary directory.
On some Linux distributions, a system utility will clean files and directories
from /tmp
if they have not been recently accessed. This behavior can lead to
the private temporary directory being removed while Elasticsearch is running if
features that require the temporary directory are not used for a long time.
Removing the private temporary directory causes problems if a feature that
requires this directory is subsequently used.
If you install Elasticsearch using the .deb
or .rpm
packages and run it
under systemd
, the private temporary directory that Elasticsearch uses
is excluded from periodic cleanup.
If you intend to run the .tar.gz
distribution on Linux or MacOS for
an extended period, consider creating a dedicated temporary
directory for Elasticsearch that is not under a path that will have old files
and directories cleaned from it. This directory should have permissions set
so that only the user that Elasticsearch runs as can access it. Then, set the
$ES_TMPDIR
environment variable to point to this directory before starting
Elasticsearch.
JVM fatal error log setting
editBy default, Elasticsearch configures the JVM to write fatal error logs
to the default logging directory. On RPM and Debian packages,
this directory is /var/log/elasticsearch
. On Linux and MacOS and Windows distributions, the logs
directory is located under the root of the Elasticsearch installation.
These are logs produced by the JVM when it encounters a fatal error, such as a
segmentation fault. If this path is not suitable for receiving logs,
modify the -XX:ErrorFile=...
entry in jvm.options
.
Cluster backups
editIn a disaster, snapshots can prevent permanent data loss. Snapshot lifecycle management is the easiest way to take regular backups of your cluster. For more information, see Create a snapshot.
Taking a snapshot is the only reliable and supported way to back up a cluster. You cannot back up an Elasticsearch cluster by making copies of the data directories of its nodes. There are no supported methods to restore any data from a filesystem-level backup. If you try to restore a cluster from such a backup, it may fail with reports of corruption or missing files or other data inconsistencies, or it may appear to have succeeded having silently lost some of your data.