- Elasticsearch Guide: other versions:
- What is Elasticsearch?
- What’s new in 7.10
- Getting started with Elasticsearch
- Set up Elasticsearch
- Installing Elasticsearch
- Configuring Elasticsearch
- Setting JVM options
- Secure settings
- Auditing settings
- Circuit breaker settings
- Cluster-level shard allocation and routing settings
- Cross-cluster replication settings
- Discovery and cluster formation settings
- Field data cache settings
- HTTP
- Index lifecycle management settings
- Index management settings
- Index recovery settings
- Indexing buffer settings
- License settings
- Local gateway settings
- Logging
- Machine learning settings
- Monitoring settings
- Node
- Network settings
- Node query cache settings
- Search settings
- Security settings
- Shard request cache settings
- Snapshot lifecycle management settings
- Transforms settings
- Transport
- Thread pools
- Watcher settings
- Important Elasticsearch configuration
- Important System Configuration
- Bootstrap Checks
- Heap size check
- File descriptor check
- Memory lock check
- Maximum number of threads check
- Max file size check
- Maximum size virtual memory check
- Maximum map count check
- Client JVM check
- Use serial collector check
- System call filter check
- OnError and OnOutOfMemoryError checks
- Early-access check
- G1GC check
- All permission check
- Discovery configuration check
- Bootstrap Checks for X-Pack
- Starting Elasticsearch
- Stopping Elasticsearch
- Discovery and cluster formation
- Add and remove nodes in your cluster
- Full-cluster restart and rolling restart
- Remote clusters
- Set up X-Pack
- Configuring X-Pack Java Clients
- Plugins
- Upgrade Elasticsearch
- Index modules
- Mapping
- Text analysis
- Overview
- Concepts
- Configure text analysis
- Built-in analyzer reference
- Tokenizer reference
- Token filter reference
- Apostrophe
- ASCII folding
- CJK bigram
- CJK width
- Classic
- Common grams
- Conditional
- Decimal digit
- Delimited payload
- Dictionary decompounder
- Edge n-gram
- Elision
- Fingerprint
- Flatten graph
- Hunspell
- Hyphenation decompounder
- Keep types
- Keep words
- Keyword marker
- Keyword repeat
- KStem
- Length
- Limit token count
- Lowercase
- MinHash
- Multiplexer
- N-gram
- Normalization
- Pattern capture
- Pattern replace
- Phonetic
- Porter stem
- Predicate script
- Remove duplicates
- Reverse
- Shingle
- Snowball
- Stemmer
- Stemmer override
- Stop
- Synonym
- Synonym graph
- Trim
- Truncate
- Unique
- Uppercase
- Word delimiter
- Word delimiter graph
- Character filters reference
- Normalizers
- Index templates
- Data streams
- Ingest node
- Search your data
- Query DSL
- Aggregations
- Bucket aggregations
- Adjacency matrix
- Auto-interval date histogram
- Children
- Composite
- Date histogram
- Date range
- Diversified sampler
- Filter
- Filters
- Geo-distance
- Geohash grid
- Geotile grid
- Global
- Histogram
- IP range
- Missing
- Nested
- Parent
- Range
- Rare terms
- Reverse nested
- Sampler
- Significant terms
- Significant text
- Terms
- Variable width histogram
- Subtleties of bucketing range fields
- Metrics aggregations
- Pipeline aggregations
- Bucket aggregations
- EQL
- SQL access
- Overview
- Getting Started with SQL
- Conventions and Terminology
- Security
- SQL REST API
- SQL Translate API
- SQL CLI
- SQL JDBC
- SQL ODBC
- SQL Client Applications
- SQL Language
- Functions and Operators
- Comparison Operators
- Logical Operators
- Math Operators
- Cast Operators
- LIKE and RLIKE Operators
- Aggregate Functions
- Grouping Functions
- Date/Time and Interval Functions and Operators
- Full-Text Search Functions
- Mathematical Functions
- String Functions
- Type Conversion Functions
- Geo Functions
- Conditional Functions And Expressions
- System Functions
- Reserved keywords
- SQL Limitations
- Scripting
- Data management
- ILM: Manage the index lifecycle
- Overview
- Concepts
- Automate rollover
- Manage Filebeat time-based indices
- Index lifecycle actions
- Configure a lifecycle policy
- Migrate index allocation filters to node roles
- Resolve lifecycle policy execution errors
- Start and stop index lifecycle management
- Manage existing indices
- Skip rollover
- Restore a managed data stream or index
- Monitor a cluster
- Frozen indices
- Roll up or transform your data
- Set up a cluster for high availability
- Snapshot and restore
- Secure a cluster
- Overview
- Configuring security
- User authentication
- Built-in users
- Internal users
- Token-based authentication services
- Realms
- Realm chains
- Active Directory user authentication
- File-based user authentication
- LDAP user authentication
- Native user authentication
- OpenID Connect authentication
- PKI user authentication
- SAML authentication
- Kerberos authentication
- Integrating with other authentication systems
- Enabling anonymous access
- Controlling the user cache
- Configuring SAML single-sign-on on the Elastic Stack
- Configuring single sign-on to the Elastic Stack using OpenID Connect
- User authorization
- Built-in roles
- Defining roles
- Granting access to Stack Management features
- Security privileges
- Document level security
- Field level security
- Granting privileges for data streams and index aliases
- Mapping users and groups to roles
- Setting up field and document level security
- Submitting requests on behalf of other users
- Configuring authorization delegation
- Customizing roles and authorization
- Enabling audit logging
- Encrypting communications
- Restricting connections with IP filtering
- Cross cluster search, clients, and integrations
- Tutorial: Getting started with security
- Tutorial: Encrypting communications
- Troubleshooting
- Some settings are not returned via the nodes settings API
- Authorization exceptions
- Users command fails due to extra arguments
- Users are frequently locked out of Active Directory
- Certificate verification fails for curl on Mac
- SSLHandshakeException causes connections to fail
- Common SSL/TLS exceptions
- Common Kerberos exceptions
- Common SAML issues
- Internal Server Error in Kibana
- Setup-passwords command fails due to connection failure
- Failures due to relocation of the configuration files
- Limitations
- Watch for cluster and index events
- Command line tools
- How To
- Glossary of terms
- REST APIs
- API conventions
- Compact and aligned text (CAT) APIs
- cat aliases
- cat allocation
- cat anomaly detectors
- cat count
- cat data frame analytics
- cat datafeeds
- cat fielddata
- cat health
- cat indices
- cat master
- cat nodeattrs
- cat nodes
- cat pending tasks
- cat plugins
- cat recovery
- cat repositories
- cat segments
- cat shards
- cat snapshots
- cat task management
- cat templates
- cat thread pool
- cat trained model
- cat transforms
- Cluster APIs
- Cluster allocation explain
- Cluster get settings
- Cluster health
- Cluster reroute
- Cluster state
- Cluster stats
- Cluster update settings
- Nodes feature usage
- Nodes hot threads
- Nodes info
- Nodes reload secure settings
- Nodes stats
- Pending cluster tasks
- Remote cluster info
- Task management
- Voting configuration exclusions
- Cross-cluster replication APIs
- Data stream APIs
- Document APIs
- Enrich APIs
- Graph explore API
- Index APIs
- Add index alias
- Analyze
- Clear cache
- Clone index
- Close index
- Create index
- Delete index
- Delete index alias
- Delete component template
- Delete index template
- Delete index template (legacy)
- Flush
- Force merge
- Freeze index
- Get component template
- Get field mapping
- Get index
- Get index alias
- Get index settings
- Get index template
- Get index template (legacy)
- Get mapping
- Index alias exists
- Index exists
- Index recovery
- Index segments
- Index shard stores
- Index stats
- Index template exists (legacy)
- Open index
- Put index template
- Put index template (legacy)
- Put component template
- Put mapping
- Refresh
- Rollover index
- Shrink index
- Simulate index
- Simulate template
- Split index
- Synced flush
- Type exists
- Unfreeze index
- Update index alias
- Update index settings
- Resolve index
- List dangling indices
- Import dangling index
- Delete dangling index
- Index lifecycle management APIs
- Ingest APIs
- Info API
- Licensing APIs
- Machine learning anomaly detection APIs
- Add events to calendar
- Add jobs to calendar
- Close jobs
- Create jobs
- Create calendars
- Create datafeeds
- Create filters
- Delete calendars
- Delete datafeeds
- Delete events from calendar
- Delete filters
- Delete forecasts
- Delete jobs
- Delete jobs from calendar
- Delete model snapshots
- Delete expired data
- Estimate model memory
- Find file structure
- Flush jobs
- Forecast jobs
- Get buckets
- Get calendars
- Get categories
- Get datafeeds
- Get datafeed statistics
- Get influencers
- Get jobs
- Get job statistics
- Get machine learning info
- Get model snapshots
- Get overall buckets
- Get scheduled events
- Get filters
- Get records
- Open jobs
- Post data to jobs
- Preview datafeeds
- Revert model snapshots
- Set upgrade mode
- Start datafeeds
- Stop datafeeds
- Update datafeeds
- Update filters
- Update jobs
- Update model snapshots
- Machine learning data frame analytics APIs
- Create data frame analytics jobs
- Create trained models
- Update data frame analytics jobs
- Delete data frame analytics jobs
- Delete trained models
- Evaluate data frame analytics
- Explain data frame analytics
- Get data frame analytics jobs
- Get data frame analytics jobs stats
- Get trained models
- Get trained models stats
- Start data frame analytics jobs
- Stop data frame analytics jobs
- Migration APIs
- Reload search analyzers API
- Repositories metering APIs
- Rollup APIs
- Search APIs
- Searchable snapshots APIs
- Security APIs
- Authenticate
- Change passwords
- Clear cache
- Clear roles cache
- Clear privileges cache
- Clear API key cache
- Create API keys
- Create or update application privileges
- Create or update role mappings
- Create or update roles
- Create or update users
- Delegate PKI authentication
- Delete application privileges
- Delete role mappings
- Delete roles
- Delete users
- Disable users
- Enable users
- Get API key information
- Get application privileges
- Get builtin privileges
- Get role mappings
- Get roles
- Get token
- Get users
- Grant API keys
- Has privileges
- Invalidate API key
- Invalidate token
- OpenID Connect prepare authentication
- OpenID Connect authenticate
- OpenID Connect logout
- SAML prepare authentication
- SAML authenticate
- SAML logout
- SAML invalidate
- SSL certificate
- Snapshot and restore APIs
- Snapshot lifecycle management APIs
- Transform APIs
- Usage API
- Watcher APIs
- Definitions
- Migration guide
- Release notes
- Elasticsearch version 7.10.2
- Elasticsearch version 7.10.1
- Elasticsearch version 7.10.0
- Elasticsearch version 7.9.3
- Elasticsearch version 7.9.2
- Elasticsearch version 7.9.1
- Elasticsearch version 7.9.0
- Elasticsearch version 7.8.1
- Elasticsearch version 7.8.0
- Elasticsearch version 7.7.1
- Elasticsearch version 7.7.0
- Elasticsearch version 7.6.2
- Elasticsearch version 7.6.1
- Elasticsearch version 7.6.0
- Elasticsearch version 7.5.2
- Elasticsearch version 7.5.1
- Elasticsearch version 7.5.0
- Elasticsearch version 7.4.2
- Elasticsearch version 7.4.1
- Elasticsearch version 7.4.0
- Elasticsearch version 7.3.2
- Elasticsearch version 7.3.1
- Elasticsearch version 7.3.0
- Elasticsearch version 7.2.1
- Elasticsearch version 7.2.0
- Elasticsearch version 7.1.1
- Elasticsearch version 7.1.0
- Elasticsearch version 7.0.0
- Elasticsearch version 7.0.0-rc2
- Elasticsearch version 7.0.0-rc1
- Elasticsearch version 7.0.0-beta1
- Elasticsearch version 7.0.0-alpha2
- Elasticsearch version 7.0.0-alpha1
- Dependencies and versions
Important Elasticsearch configuration
editImportant Elasticsearch configuration
editElasticsearch requires very little configuration to get started, but there are a number of items which must be considered before using your cluster in production:
Our Elastic Cloud service configures these items automatically, making your cluster production-ready by default.
Path settings
editElasticsearch writes the data you index to indices and data streams to a data
directory. Elasticsearch writes its own application logs, which contain information about
cluster health and operations, to a logs
directory.
For macOS .tar.gz
, Linux .tar.gz
, and
Windows .zip
installations, data
and logs
are
subdirectories of $ES_HOME
by default. However, files in $ES_HOME
risk
deletion during an upgrade.
In production, we strongly recommend you set the path.data
and path.logs
in
elasticsearch.yml
to locations outside of $ES_HOME
.
Docker, Debian, RPM, macOS Homebrew,
and Windows .msi
installations write data and log to locations
outside of $ES_HOME
by default.
Supported path.data
and path.logs
values vary by platform:
Linux and macOS installations support Unix-style paths:
path: data: /var/data/elasticsearch logs: /var/log/elasticsearch
Windows installations support DOS paths with escaped backslashes:
path: data: "C:\\Elastic\\Elasticsearch\\data" logs: "C:\\Elastic\\Elasticsearch\\logs"
If needed, you can specify multiple paths in path.data
. Elasticsearch stores the node’s
data across all provided paths but keeps each shard’s data on the same path.
Elasticsearch does not balance shards across a node’s data paths. High disk usage in a single path can trigger a high disk usage watermark for the entire node. If triggered, Elasticsearch will not add shards to the node, even if the node’s other paths have available disk space. If you need additional disk space, we recommend you add a new node rather than additional data paths.
Linux and macOS installations support multiple Unix-style paths in path.data
:
path: data: - /mnt/elasticsearch_1 - /mnt/elasticsearch_2 - /mnt/elasticsearch_3
Windows installations support multiple DOS paths in path.data
:
path: data: - "C:\\Elastic\\Elasticsearch_1" - "E:\\Elastic\\Elasticsearch_1" - "F:\\Elastic\\Elasticsearch_3"
Cluster name setting
editA node can only join a cluster when it shares its cluster.name
with all the
other nodes in the cluster. The default name is elasticsearch
, but you should
change it to an appropriate name that describes the purpose of the cluster.
cluster.name: logging-prod
Do not reuse the same cluster names in different environments. Otherwise, nodes might join the wrong cluster.
Node name setting
editElasticsearch uses node.name
as a human-readable identifier for a
particular instance of Elasticsearch. This name is included in the response
of many APIs. The node name defaults to the hostname of the machine when
Elasticsearch starts, but can be configured explicitly in
elasticsearch.yml
:
node.name: prod-data-2
Network host setting
editBy default, Elasticsearch binds to loopback addresses only such as 127.0.0.1
and [::1]
. This binding is sufficient to run a single development node on a
server.
more than one node can be started from the same $ES_HOME
location on a single node. This setup can be useful for testing Elasticsearch’s
ability to form clusters, but it is not a configuration recommended for
production.
To form a cluster with nodes on other servers, your
node will need to bind to a non-loopback address. While there are many
network settings, usually all you need to configure is
network.host
:
network.host: 192.168.1.10
The network.host
setting also understands some special values such as
_local_
, _site_
, _global_
and modifiers like :ip4
and :ip6
. See
Special values for network.host
.
When you provide a custom setting for network.host
,
Elasticsearch assumes that you are moving from development mode to production
mode, and upgrades a number of system startup checks from warnings to
exceptions. See the differences between development and production modes.
Discovery and cluster formation settings
editConfigure two important discovery and cluster formation settings before going to production so that nodes in the cluster can discover each other and elect a master node.
discovery.seed_hosts
editOut of the box, without any network configuration, Elasticsearch will bind to
the available loopback addresses and scan local ports 9300
to 9305
to
connect with other nodes running on the same server. This behavior provides an
auto-clustering experience without having to do any configuration.
When you want to form a cluster with nodes on other hosts, use the
static discovery.seed_hosts
setting. This setting
provides a list of other nodes in the cluster
that are master-eligible and likely to be live and contactable to seed
the discovery process. This setting
accepts a YAML sequence or array of the addresses of all the master-eligible
nodes in the cluster. Each address can be either an IP address or a hostname
that resolves to one or more IP addresses via DNS.
discovery.seed_hosts: - 192.168.1.10:9300 - 192.168.1.11 - seeds.mydomain.com - [0:0:0:0:0:ffff:c0a8:10c]:9301
The port is optional and defaults to |
|
If a hostname resolves to multiple IP addresses, the node will attempt to discover other nodes at all resolved addresses. |
|
IPv6 addresses must be enclosed in square brackets. |
If your master-eligible nodes do not have fixed names or addresses, use an alternative hosts provider to find their addresses dynamically.
cluster.initial_master_nodes
editWhen you start an Elasticsearch cluster for the first time, a cluster bootstrapping step determines the set of master-eligible nodes whose votes are counted in the first election. In development mode, with no discovery settings configured, this step is performed automatically by the nodes themselves.
Because auto-bootstrapping is inherently
unsafe, when starting a new cluster in production
mode, you must explicitly list the master-eligible nodes whose votes should be
counted in the very first election. You set this list using the
cluster.initial_master_nodes
setting.
After the cluster forms successfully for the first time, remove the cluster.initial_master_nodes
setting from each nodes'
configuration. Do not use this setting when
restarting a cluster or adding a new node to an existing cluster.
discovery.seed_hosts: - 192.168.1.10:9300 - 192.168.1.11 - seeds.mydomain.com - [0:0:0:0:0:ffff:c0a8:10c]:9301 cluster.initial_master_nodes: - master-node-a - master-node-b - master-node-c
Identify the initial master nodes by their |
See bootstrapping a cluster and discovery and cluster formation settings.
Heap size settings
editBy default, Elasticsearch tells the JVM to use a heap with a minimum and maximum size of 1 GB. When moving to production, it is important to configure heap size to ensure that Elasticsearch has enough heap available.
Elasticsearch will assign the entire heap specified in
jvm.options via the Xms
(minimum heap size) and Xmx
(maximum
heap size) settings. These two settings must be equal to each other.
The value for these settings depends on the amount of RAM available on your server:
-
Set
Xmx
andXms
to no more than 50% of your physical RAM. Elasticsearch requires memory for purposes other than the JVM heap and it is important to leave space for this. For instance, Elasticsearch uses off-heap buffers for efficient network communication, relies on the operating system’s filesystem cache for efficient access to files, and the JVM itself requires some memory too. It is normal to observe the Elasticsearch process using more memory than the limit configured with theXmx
setting. -
Set
Xmx
andXms
to no more than the threshold that the JVM uses for compressed object pointers (compressed oops). The exact threshold varies but is near 32 GB. You can verify that you are under the threshold by looking for a line in the logs like the following:heap size [1.9gb], compressed ordinary object pointers [true]
-
Set
Xmx
andXms
to no more than the threshold for zero-based compressed oops. The exact threshold varies but 26 GB is safe on most systems and can be as large as 30 GB on some systems. You can verify that you are under this threshold by starting Elasticsearch with the JVM options-XX:+UnlockDiagnosticVMOptions -XX:+PrintCompressedOopsMode
and looking for a line like the following:heap address: 0x000000011be00000, size: 27648 MB, zero based Compressed Oops
This line shows that zero-based compressed oops are enabled. If zero-based compressed oops are not enabled, you’ll see a line like the following instead:
heap address: 0x0000000118400000, size: 28672 MB, Compressed Oops with base: 0x00000001183ff000
The more heap available to Elasticsearch, the more memory it can use for its internal caches, but the less memory it leaves available for the operating system to use for the filesystem cache. Also, larger heaps can cause longer garbage collection pauses.
Here is an example of how to set the heap size via a jvm.options.d/
file:
Using jvm.options.d
is the preferred method for configuring the heap size for
production deployments.
It is also possible to set the heap size via the ES_JAVA_OPTS
environment
variable. This is generally discouraged for production deployments but is useful
for testing because it overrides all other means of setting JVM options.
ES_JAVA_OPTS="-Xms2g -Xmx2g" ./bin/elasticsearch ES_JAVA_OPTS="-Xms4000m -Xmx4000m" ./bin/elasticsearch
Configuring the heap for the Windows service is different than the above. The values initially populated for the Windows service can be configured as above but are different after the service has been installed. Consult the Windows service documentation for additional details.
JVM heap dump path setting
editBy default, Elasticsearch configures the JVM to dump the heap on out of
memory exceptions to the default data directory. On RPM and
Debian packages, the data directory is /var/lib/elasticsearch
. On
Linux and MacOS and Windows distributions,
the data
directory is located under the root of the Elasticsearch installation.
If this path is not suitable for receiving heap dumps, modify the
-XX:HeapDumpPath=...
entry in jvm.options
:
- If you specify a directory, the JVM will generate a filename for the heap dump based on the PID of the running instance.
- If you specify a fixed filename instead of a directory, the file must not exist when the JVM needs to perform a heap dump on an out of memory exception. Otherwise, the heap dump will fail.
GC logging settings
editBy default, Elasticsearch enables garbage collection (GC) logs. These are configured in
jvm.options
and output to the same default location as
the Elasticsearch logs. The default configuration rotates the logs every 64 MB and
can consume up to 2 GB of disk space.
You can reconfigure JVM logging using the command line options described in
JEP 158: Unified JVM Logging. Unless you
change the default jvm.options
file directly, the Elasticsearch default
configuration is applied in addition to your own settings. To disable the
default configuration, first disable logging by supplying the
-Xlog:disable
option, then supply your own command line options. This
disables all JVM logging, so be sure to review the available options
and enable everything that you require.
To see further options not contained in the original JEP, see Enable Logging with the JVM Unified Logging Framework.
Examples
editChange the default GC log output location to /opt/my-app/gc.log
by
creating $ES_HOME/config/jvm.options.d/gc.options
with some sample
options:
# Turn off all previous logging configuratons -Xlog:disable # Default settings from JEP 158, but with `utctime` instead of `uptime` to match the next line -Xlog:all=warning:stderr:utctime,level,tags # Enable GC logging to a custom location with a variety of options -Xlog:gc*,gc+age=trace,safepoint:file=/opt/my-app/gc.log:utctime,pid,tags:filecount=32,filesize=64m
Configure an Elasticsearch Docker container to send GC debug logs to
standard error (stderr
). This lets the container orchestrator
handle the output. If using the ES_JAVA_OPTS
environment variable,
specify:
MY_OPTS="-Xlog:disable -Xlog:all=warning:stderr:utctime,level,tags -Xlog:gc=debug:stderr:utctime" docker run -e ES_JAVA_OPTS="$MY_OPTS" # etc
Temporary directory settings
editBy default, Elasticsearch uses a private temporary directory that the startup script creates immediately below the system temporary directory.
On some Linux distributions, a system utility will clean files and directories
from /tmp
if they have not been recently accessed. This behavior can lead to
the private temporary directory being removed while Elasticsearch is running if
features that require the temporary directory are not used for a long time.
Removing the private temporary directory causes problems if a feature that
requires this directory is subsequently used.
If you install Elasticsearch using the .deb
or .rpm
packages and run it
under systemd
, the private temporary directory that Elasticsearch uses
is excluded from periodic cleanup.
If you intend to run the .tar.gz
distribution on Linux or MacOS for
an extended period, consider creating a dedicated temporary
directory for Elasticsearch that is not under a path that will have old files
and directories cleaned from it. This directory should have permissions set
so that only the user that Elasticsearch runs as can access it. Then, set the
$ES_TMPDIR
environment variable to point to this directory before starting
Elasticsearch.
JVM fatal error log setting
editBy default, Elasticsearch configures the JVM to write fatal error logs
to the default logging directory. On RPM and Debian packages,
this directory is /var/log/elasticsearch
. On Linux and MacOS and Windows distributions, the logs
directory is located under the root of the Elasticsearch installation.
These are logs produced by the JVM when it encounters a fatal error, such as a
segmentation fault. If this path is not suitable for receiving logs,
modify the -XX:ErrorFile=...
entry in jvm.options
.
Cluster backups
editIn a disaster, snapshots can prevent permanent data loss. Snapshot lifecycle management is the easiest way to take regular backups of your cluster. For more information, see Back up a cluster.
You cannot back up an Elasticsearch cluster by simply copying the data directories of all of its nodes. Elasticsearch may be making changes to the contents of its data directories while it is running; copying its data directories cannot be expected to capture a consistent picture of their contents. If you try to restore a cluster from such a backup, it may fail and report corruption and/or missing files. Alternatively, it may appear to have succeeded though it silently lost some of its data. The only reliable way to back up a cluster is by using the snapshot and restore functionality.
On this page