Elasticsearch Guide: other versions:
What is Elasticsearch?
- Data in: documents and indices
- Information out: search and analyze
- Scalability and resilience
What’s new in 8.13
Quick start
Set up Elasticsearch
- Installing Elasticsearch
- Run Elasticsearch locally
- Configuring Elasticsearch
- Important system configuration
- Bootstrap Checks
- Bootstrap Checks for X-Pack
- Starting Elasticsearch
- Stopping Elasticsearch
- Discovery and cluster formation
- Add and remove nodes in your cluster
- Full-cluster restart and rolling restart
- Remote clusters
- Plugins
Upgrade Elasticsearch
- Archived settings
- Reading indices from older Elasticsearch versions
Index modules
- Analysis
- Index Shard Allocation
- Index blocks
- Mapper
- Merge
- Similarity module
- Slow Log
- Store
  - Preloading data into the file system cache
- Translog
- History retention
- Index Sorting
  - Use index sorting to speed up conjunctions
- Indexing pressure
Mapping
- Dynamic mapping
  - Dynamic field mapping
  - Dynamic templates
- Explicit mapping
- Runtime fields
- Field data types
  - Aggregate metric
  - Alias
  - Arrays
  - Binary
  - Boolean
  - Completion
  - Date
  - Date nanoseconds
  - Dense vector
  - Flattened
  - Geopoint
  - Geoshape
  - Histogram
  - IP
  - Join
  - Keyword
  - Nested
  - Numeric
  - Object
  - Percolator
  - Point
  - Range
  - Rank feature
  - Rank features
  - Search-as-you-type
  - Shape
  - Sparse vector
  - Text
  - Token count
  - Unsigned long
  - Version
- Metadata fields
- Mapping parameters
- Mapping limit settings
- Removal of mapping types
Text analysis
- Overview
- Concepts
- Configure text analysis
- Built-in analyzer reference
  - Fingerprint
  - Keyword
  - Language
  - Pattern
  - Simple
  - Standard
  - Stop
  - Whitespace
- Tokenizer reference
  - Character group
  - Classic
  - Edge n-gram
  - Keyword
  - Letter
  - Lowercase
  - N-gram
  - Path hierarchy
  - Pattern
  - Simple pattern
  - Simple pattern split
  - Standard
  - Thai
  - UAX URL email
  - Whitespace
- Token filter reference
- Character filters reference
- Normalizers
Index templates
- Simulate multi-component templates
- Config ignore_missing_component_templates
  - Usage example
Data streams
- Set up a data stream
- Use a data stream
- Modify a data stream
- Time series data stream (TSDS)
- Data stream lifecycle
Ingest pipelines
- Example: Parse logs
- Enrich your data
- Processor reference
  - Append
  - Attachment
  - Bytes
  - Circle
  - Community ID
  - Convert
  - CSV
  - Date
  - Date index name
  - Dissect
  - Dot expander
  - Drop
  - Enrich
  - Fail
  - Fingerprint
  - Foreach
  - Geo-grid
  - GeoIP
  - Grok
  - Gsub
  - HTML strip
  - Inference
  - Join
  - JSON
  - KV
  - Lowercase
  - Network direction
  - Pipeline
  - Redact
  - Registered domain
  - Remove
  - Rename
  - Reroute
  - Script
  - Set
  - Set security user
  - Sort
  - Split
  - Trim
  - Uppercase
  - URL decode
  - URI parts
  - User agent
- Ingest pipelines in Search
  - Inference processing
  - NLP tutorial
Aliases
Search your data
- The search API
- Search Applications
- kNN search
- Semantic search
  - Semantic search with ELSER
  - Semantic search with the inference API
- Learning To Rank
  - Deploy and manage LTR models
  - Search using LTR
- Search across clusters
- Search with synonyms
- Search analytics
Query DSL
- Query and filter context
- Compound queries
- Full text queries
- Geo queries
- Shape queries
  - Shape
- Joining queries
  - Nested
  - Has child
  - Has parent
  - Parent ID
- Match all
- Span queries
- Specialized queries
  - Distance feature
  - More like this
  - Percolate
  - Knn
  - Rank feature
  - Script
  - Script score
  - Wrapper
  - Pinned Query
  - Rule
  - Weighted tokens
- Term-level queries
  - Exists
  - Fuzzy
  - IDs
  - Prefix
  - Range
  - Regexp
  - Term
  - Terms
  - Terms set
  - Wildcard
- Text expansion
- minimum_should_match parameter
- rewrite parameter
- Regular expression syntax
Aggregations
- Bucket aggregations
- Metrics aggregations
  - Avg
  - Boxplot
  - Cardinality
  - Extended stats
  - Geo-bounds
  - Geo-centroid
  - Geo-line
  - Cartesian-bounds
  - Cartesian-centroid
  - Matrix stats
  - Max
  - Median absolute deviation
  - Min
  - Percentile ranks
  - Percentiles
  - Rate
  - Scripted metric
  - Stats
  - String stats
  - Sum
  - T-test
  - Top hits
  - Top metrics
  - Value count
  - Weighted avg
- Pipeline aggregations
Geospatial analysis
EQL
- Syntax reference
- Function reference
- Pipe reference
- Example: Detect threats with EQL
ES|QL
- Getting started
- ES|QL reference
- Using ES|QL
- Limitations
- Examples
SQL
- Overview
- Getting Started with SQL
- Conventions and Terminology
  - Mapping concepts across SQL and Elasticsearch
- Security
- SQL REST API
- SQL Translate API
- SQL CLI
- SQL JDBC
  - API usage
- SQL ODBC
  - Driver installation
  - Configuration
- SQL Client Applications
- SQL Language
- Functions and Operators
- Reserved keywords
- SQL Limitations
Scripting
- Painless scripting language
- How to write scripts
- Access fields in a document
- Common scripting use cases
  - Field extraction
- Accessing document fields and special variables
- Scripting and security
- Lucene expressions language
- Advanced scripts using script engines
Data management
- ILM: Manage the index lifecycle
- Tutorial: Customize built-in policies
- Tutorial: Automate rollover
- Index management in Kibana
- Overview
- Concepts
- Index lifecycle actions
  - Allocate
  - Delete
  - Force merge
  - Migrate
  - Read only
  - Rollover
  - Downsample
  - Searchable snapshot
  - Set priority
  - Shrink
  - Unfollow
  - Wait for snapshot
- Configure a lifecycle policy
- Migrate index allocation filters to node roles
- Troubleshooting index lifecycle management errors
- Start and stop index lifecycle management
- Manage existing indices
- Skip rollover
- Restore a managed data stream or index
- Data tiers
Autoscaling
- Autoscaling deciders
Monitor a cluster
- Overview
- How it works
- Monitoring in a production environment
- Collecting monitoring data with Elastic Agent
- Collecting monitoring data with Metricbeat
- Collecting log data with Filebeat
- Configuring data streams/indices for monitoring
- Legacy collection methods
Roll up or transform your data
- Rolling up historical data
- Transforming data
Set up a cluster for high availability
- Designing for resilience
  - Resilience in small clusters
  - Resilience in larger clusters
- Cross-cluster replication
Snapshot and restore
- Register a repository
- Create a snapshot
- Restore a snapshot
- Searchable snapshots
Secure the Elastic Stack
- Elasticsearch security principles
- Start the Elastic Stack with security enabled automatically
- Manually configure security
- Updating node security certificates
  - With the same CA
  - With a different CA
- User authentication
- User authorization
- Enable audit logging
- Restricting connections with IP filtering
- Securing clients and integrations
- Operator privileges
- Troubleshooting
- Limitations
Watcher
- Getting started with Watcher
- How Watcher works
- Encrypting sensitive data in Watcher
- Inputs
- Triggers
  - Schedule trigger
- Conditions
- Actions
- Transforms
- Managing watches
- Example watches
  - Watching the status of an Elasticsearch cluster
- Limitations
Command line tools
- elasticsearch-certgen
- elasticsearch-certutil
- elasticsearch-create-enrollment-token
- elasticsearch-croneval
- elasticsearch-keystore
- elasticsearch-node
- elasticsearch-reconfigure-node
- elasticsearch-reset-password
- elasticsearch-saml-metadata
- elasticsearch-service-tokens
- elasticsearch-setup-passwords
- elasticsearch-shard
- elasticsearch-syskeygen
- elasticsearch-users
How to
- General recommendations
- Recipes
- Tune for indexing speed
- Tune for search speed
- Tune approximate kNN search
- Tune for disk usage
- Size your shards
- Use Elasticsearch for time series data
Troubleshooting
- Fix common cluster issues
  - Watermark errors
  - Circuit breaker errors
  - High CPU usage
  - High JVM memory pressure
  - Red or yellow cluster health status
  - Rejected requests
  - Task queue backlog
  - Mapping explosion
  - Hot spotting
- Diagnose unassigned shards
- Add a missing tier to the system
- Allow Elasticsearch to allocate the data in the system
- Allow Elasticsearch to allocate the index
- Indices mix index allocation filters with data tiers node roles to move through data tiers
- Not enough nodes to allocate all shard replicas
- Total number of shards for an index on a single node exceeded
- Total number of shards per node has been reached
- Troubleshooting corruption
- Fix data nodes out of disk
  - Increase the disk capacity of data nodes
  - Decrease the disk usage of data nodes
- Fix master nodes out of disk
- Fix other role nodes out of disk
- Start index lifecycle management
- Start Snapshot Lifecycle Management
- Restore from snapshot
- Troubleshooting broken repositories
  - Diagnosing corrupted repositories
  - Diagnosing unknown repositories
  - Diagnosing invalid repositories
- Addressing repeated snapshot policy failures
- Troubleshooting an unstable cluster
- Troubleshooting discovery
- Troubleshooting monitoring
- Troubleshooting transforms
- Troubleshooting Watcher
- Troubleshooting searches
- Troubleshooting shards capacity health issues
REST APIs
- API conventions
- Common options
- REST API compatibility
- Autoscaling APIs
  - Create or update autoscaling policy
  - Get autoscaling capacity
  - Delete autoscaling policy
  - Get autoscaling policy
- Behavioral Analytics APIs
  - Put Analytics Collection
  - Delete Analytics Collection
  - List Analytics Collections
  - Post Analytics Collection Event
- Compact and aligned text (CAT) APIs
  - cat aliases
  - cat allocation
  - cat anomaly detectors
  - cat component templates
  - cat count
  - cat data frame analytics
  - cat datafeeds
  - cat fielddata
  - cat health
  - cat indices
  - cat master
  - cat nodeattrs
  - cat nodes
  - cat pending tasks
  - cat plugins
  - cat recovery
  - cat repositories
  - cat segments
  - cat shards
  - cat snapshots
  - cat task management
  - cat templates
  - cat thread pool
  - cat trained model
  - cat transforms
- Cluster APIs
  - Cluster allocation explain
  - Cluster get settings
  - Cluster health
  - Health
  - Cluster reroute
  - Cluster state
  - Cluster stats
  - Cluster update settings
  - Nodes feature usage
  - Nodes hot threads
  - Nodes info
  - Prevalidate node removal
  - Nodes reload secure settings
  - Nodes stats
  - Cluster Info
  - Pending cluster tasks
  - Remote cluster info
  - Task management
  - Voting configuration exclusions
  - Create or update desired nodes
  - Get desired nodes
  - Delete desired nodes
  - Get desired balance
  - Reset desired balance
- Cross-cluster replication APIs
  - Get CCR stats
  - Create follower
  - Pause follower
  - Resume follower
  - Unfollow
  - Forget follower
  - Get follower stats
  - Get follower info
  - Create auto-follow pattern
  - Delete auto-follow pattern
  - Get auto-follow pattern
  - Pause auto-follow pattern
  - Resume auto-follow pattern
- Connector APIs
  - Cancel connector sync job
  - Check in a connector
  - Check in connector sync job
  - Create connector
  - Create connector sync job
  - Delete connector
  - Delete connector sync job
  - Get connector
  - Get connector sync job
  - List connectors
  - List connector sync jobs
  - Set connector sync job error
  - Set connector sync job stats
  - Update connector API key id
  - Update connector configuration
  - Update connector error
  - Update connector filtering
  - Update connector index name
  - Update connector last sync stats
  - Update connector name and description
  - Update connector pipeline
  - Update connector scheduling
  - Update connector service type
  - Update connector status
- Data stream APIs
  - Create data stream
  - Delete data stream
  - Get data stream
  - Migrate to data stream
  - Data stream stats
  - Promote data stream
  - Modify data streams
  - Put Data Stream Lifecycle
  - Get Data Stream Lifecycle
  - Delete Data Stream Lifecycle
  - Explain Data Stream Lifecycle
  - Get Data Stream Lifecycle
  - Downsample
- Document APIs
  - Reading and Writing documents
  - Index
  - Get
  - Delete
  - Delete by query
  - Update
  - Update by query
  - Multi get
  - Bulk
  - Reindex
  - Term vectors
  - Multi term vectors
  - ?refresh
  - Optimistic concurrency control
- Enrich APIs
  - Create enrich policy
  - Delete enrich policy
  - Get enrich policy
  - Execute enrich policy
  - Enrich stats
- EQL APIs
  - Delete async EQL search
  - EQL search
  - Get async EQL search
  - Get async EQL search status
- ES|QL APIs
  - ES|QL query API
  - ES|QL async query API
  - ES|QL async query get API
  - ES|QL async query delete API
- Features APIs
  - Get features
  - Reset features
- Fleet APIs
  - Get global checkpoints
  - Fleet search
  - Fleet multi search
- Graph explore API
- Index APIs
  - Alias exists
  - Aliases
  - Analyze
  - Analyze index disk usage
  - Clear cache
  - Clone index
  - Close index
  - Create index
  - Create or update alias
  - Create or update component template
  - Create or update index template
  - Create or update index template (legacy)
  - Delete component template
  - Delete dangling index
  - Delete alias
  - Delete index
  - Delete index template
  - Delete index template (legacy)
  - Exists
  - Field usage stats
  - Flush
  - Force merge
  - Get alias
  - Get component template
  - Get field mapping
  - Get index
  - Get index settings
  - Get index template
  - Get index template (legacy)
  - Get mapping
  - Import dangling index
  - Index recovery
  - Index segments
  - Index shard stores
  - Index stats
  - Index template exists (legacy)
  - List dangling indices
  - Open index
  - Refresh
  - Resolve index
  - Resolve cluster
  - Rollover
  - Shrink index
  - Simulate index
  - Simulate template
  - Split index
  - Unfreeze index
  - Update index settings
  - Update mapping
- Index lifecycle management APIs
  - Create or update lifecycle policy
  - Get policy
  - Delete policy
  - Move to step
  - Remove policy
  - Retry policy
  - Get index lifecycle management status
  - Explain lifecycle
  - Start index lifecycle management
  - Stop index lifecycle management
  - Migrate indices, ILM policies, and legacy, composable and component templates to data tiers routing
- Inference APIs
  - Delete inference API
  - Get inference API
  - Perform inference API
  - Create inference API
- Info API
- Ingest APIs
  - Create or update pipeline
  - Delete pipeline
  - GeoIP stats
  - Get pipeline
  - Simulate pipeline
  - Simulate ingest
- Licensing APIs
  - Delete license
  - Get license
  - Get trial status
  - Start trial
  - Get basic status
  - Start basic
  - Update license
- Logstash APIs
  - Create or update Logstash pipeline
  - Delete Logstash pipeline
  - Get Logstash pipeline
- Machine learning APIs
  - Get machine learning info
  - Get machine learning memory stats
  - Set upgrade mode
- Machine learning anomaly detection APIs
  - Add events to calendar
  - Add jobs to calendar
  - Close jobs
  - Create jobs
  - Create calendars
  - Create datafeeds
  - Create filters
  - Delete calendars
  - Delete datafeeds
  - Delete events from calendar
  - Delete filters
  - Delete forecasts
  - Delete jobs
  - Delete jobs from calendar
  - Delete model snapshots
  - Delete expired data
  - Estimate model memory
  - Flush jobs
  - Forecast jobs
  - Get buckets
  - Get calendars
  - Get categories
  - Get datafeeds
  - Get datafeed statistics
  - Get influencers
  - Get jobs
  - Get job statistics
  - Get model snapshots
  - Get model snapshot upgrade statistics
  - Get overall buckets
  - Get scheduled events
  - Get filters
  - Get records
  - Open jobs
  - Post data to jobs
  - Preview datafeeds
  - Reset jobs
  - Revert model snapshots
  - Start datafeeds
  - Stop datafeeds
  - Update datafeeds
  - Update filters
  - Update jobs
  - Update model snapshots
  - Upgrade model snapshots
- Machine learning data frame analytics APIs
  - Create data frame analytics jobs
  - Delete data frame analytics jobs
  - Evaluate data frame analytics
  - Explain data frame analytics
  - Get data frame analytics jobs
  - Get data frame analytics jobs stats
  - Preview data frame analytics
  - Start data frame analytics jobs
  - Stop data frame analytics jobs
  - Update data frame analytics jobs
- Machine learning trained model APIs
  - Clear trained model deployment cache
  - Create or update trained model aliases
  - Create part of a trained model
  - Create trained models
  - Create trained model vocabulary
  - Delete trained model aliases
  - Delete trained models
  - Get trained models
  - Get trained models stats
  - Infer trained model
  - Start trained model deployment
  - Stop trained model deployment
  - Update trained model deployment
- Migration APIs
  - Deprecation info
  - Feature migration
- Node lifecycle APIs
  - Put shutdown API
  - Get shutdown API
  - Delete shutdown API
- Query rules APIs
  - Create or update query ruleset
  - Get query ruleset
  - List query rulesets
  - Delete query ruleset
- Reload search analyzers API
- Repositories metering APIs
  - Get repositories metering information
  - Clear repositories metering archive
- Rollup APIs
  - Create rollup jobs
  - Delete rollup jobs
  - Get job
  - Get rollup caps
  - Get rollup index caps
  - Rollup search
  - Start rollup jobs
  - Stop rollup jobs
- Root API
- Script APIs
  - Create or update stored script
  - Delete stored script
  - Get script contexts
  - Get script languages
  - Get stored script
- Search APIs
  - Search
  - Async search
  - Point in time
  - kNN search
  - Reciprocal rank fusion
  - Scroll
  - Clear scroll
  - Search template
  - Multi search template
  - Render search template
  - Search shards
  - Suggesters
  - Multi search
  - Count
  - Validate
  - Terms enum
  - Explain
  - Profile
  - Field capabilities
  - Ranking evaluation
  - Vector tile search
- Search Application APIs
  - Put Search Application
  - Get Search Application
  - List Search Applications
  - Delete Search Application
  - Search Application Search
  - Render Search Application Query
- Searchable snapshots APIs
  - Mount snapshot
  - Cache stats
  - Searchable snapshot statistics
  - Clear cache
- Security APIs
  - Authenticate
  - Change passwords
  - Clear cache
  - Clear roles cache
  - Clear privileges cache
  - Clear API key cache
  - Clear service account token caches
  - Create API keys
  - Create or update application privileges
  - Create or update role mappings
  - Create or update roles
  - Create or update users
  - Create service account tokens
  - Delegate PKI authentication
  - Delete application privileges
  - Delete role mappings
  - Delete roles
  - Delete service account token
  - Delete users
  - Disable users
  - Enable users
  - Enroll Kibana
  - Enroll node
  - Get API key information
  - Get application privileges
  - Get builtin privileges
  - Get role mappings
  - Get roles
  - Get service accounts
  - Get service account credentials
  - Get Security settings
  - Get token
  - Get user privileges
  - Get users
  - Grant API keys
  - Has privileges
  - Invalidate API key
  - Invalidate token
  - OpenID Connect prepare authentication
  - OpenID Connect authenticate
  - OpenID Connect logout
  - Query API key information
  - Query User
  - Update API key
  - Update Security settings
  - Bulk update API keys
  - SAML prepare authentication
  - SAML authenticate
  - SAML logout
  - SAML invalidate
  - SAML complete logout
  - SAML service provider metadata
  - SSL certificate
  - Activate user profile
  - Disable user profile
  - Enable user profile
  - Get user profiles
  - Suggest user profile
  - Update user profile data
  - Has privileges user profile
  - Create Cross-Cluster API key
  - Update Cross-Cluster API key
- Snapshot and restore APIs
  - Create or update snapshot repository
  - Verify snapshot repository
  - Repository analysis
  - Get snapshot repository
  - Delete snapshot repository
  - Clean up snapshot repository
  - Clone snapshot
  - Create snapshot
  - Get snapshot
  - Get snapshot status
  - Restore snapshot
  - Delete snapshot
- Snapshot lifecycle management APIs
  - Create or update policy
  - Get policy
  - Delete policy
  - Execute snapshot lifecycle policy
  - Execute snapshot retention policy
  - Get snapshot lifecycle management status
  - Get snapshot lifecycle stats
  - Start snapshot lifecycle management
  - Stop snapshot lifecycle management
- SQL APIs
  - Clear SQL cursor
  - Delete async SQL search
  - Get async SQL search
  - Get async SQL search status
  - SQL search
  - SQL translate
- Synonyms APIs
  - Create or update synonyms set
  - Get synonyms set
  - List synonyms sets
  - Delete synonyms set
  - Create or update synonym rule
  - Get synonym rule
  - Delete synonym rule
- Text structure APIs
  - Find structure API
  - Test Grok pattern
- Transform APIs
  - Create transform
  - Delete transform
  - Get transforms
  - Get transform statistics
  - Preview transform
  - Reset transform
  - Schedule now transform
  - Start transform
  - Stop transforms
  - Update transform
  - Upgrade transforms
- Usage API
- Watcher APIs
  - Ack watch
  - Activate watch
  - Deactivate watch
  - Delete watch
  - Execute watch
  - Get watch
  - Get Watcher stats
  - Query watches
  - Create or update watch
  - Update Watcher settings
  - Get Watcher settings
  - Start watch service
  - Stop watch service
- Definitions
  - Role mapping resources
Migration guide
- 8.13
- 8.12
- 8.11
- 8.10
- 8.9
- 8.8
- 8.7
- 8.6
- 8.5
- 8.4
- 8.3
- 8.2
- 8.1
- 8.0
  - Java time migration guide
  - Transient settings migration guide
Release notes
- Elasticsearch version 8.13.4
- Elasticsearch version 8.13.3
- Elasticsearch version 8.13.2
  - Bug fixes
- Elasticsearch version 8.13.1
  - Bug fixes
- Elasticsearch version 8.13.0
- Elasticsearch version 8.12.2
- Elasticsearch version 8.12.1
- Elasticsearch version 8.12.0
- Elasticsearch version 8.11.4
- Elasticsearch version 8.11.3
- Elasticsearch version 8.11.2
- Elasticsearch version 8.11.1
- Elasticsearch version 8.11.0
- Elasticsearch version 8.10.4
- Elasticsearch version 8.10.3
- Elasticsearch version 8.10.2
- Elasticsearch version 8.10.1
- Elasticsearch version 8.10.0
- Elasticsearch version 8.9.2
- Elasticsearch version 8.9.1
- Elasticsearch version 8.9.0
- Elasticsearch version 8.8.2
- Elasticsearch version 8.8.1
- Elasticsearch version 8.8.0
- Elasticsearch version 8.7.1
- Elasticsearch version 8.7.0
- Elasticsearch version 8.6.2
- Elasticsearch version 8.6.1
- Elasticsearch version 8.6.0
- Elasticsearch version 8.5.3
- Elasticsearch version 8.5.2
- Elasticsearch version 8.5.1
- Elasticsearch version 8.5.0
- Elasticsearch version 8.4.3
- Elasticsearch version 8.4.2
- Elasticsearch version 8.4.1
- Elasticsearch version 8.4.0
- Elasticsearch version 8.3.3
- Elasticsearch version 8.3.2
- Elasticsearch version 8.3.1
- Elasticsearch version 8.3.0
- Elasticsearch version 8.2.3
- Elasticsearch version 8.2.2
- Elasticsearch version 8.2.1
- Elasticsearch version 8.2.0
- Elasticsearch version 8.1.3
- Elasticsearch version 8.1.2
- Elasticsearch version 8.1.1
- Elasticsearch version 8.1.0
- Elasticsearch version 8.0.1
- Elasticsearch version 8.0.0
- Elasticsearch version 8.0.0-rc2
- Elasticsearch version 8.0.0-rc1
- Elasticsearch version 8.0.0-beta1
- Elasticsearch version 8.0.0-alpha2
- Elasticsearch version 8.0.0-alpha1
Dependencies and versions

IMPORTANT: No additional bug fixes or documentation updates will be released for this version. For the latest information, see the current release documentation.

« Configuring Elasticsearch Secure settings »

› › ›

Important Elasticsearch configuration

edit

Important Elasticsearch configuration

edit

Elasticsearch requires very little configuration to get started, but there are a number of items which must be considered before using your cluster in production:

Our Elastic Cloud service configures these items automatically, making your cluster production-ready by default.

Path settings

edit

Elasticsearch writes the data you index to indices and data streams to a data directory. Elasticsearch writes its own application logs, which contain information about cluster health and operations, to a logs directory.

For macOS .tar.gz, Linux .tar.gz, and Windows .zip installations, data and logs are subdirectories of $ES_HOME by default. However, files in $ES_HOME risk deletion during an upgrade.

In production, we strongly recommend you set the path.data and path.logs in elasticsearch.yml to locations outside of $ES_HOME. Docker, Debian, and RPM installations write data and log to locations outside of $ES_HOME by default.

Supported path.data and path.logs values vary by platform:

Linux and macOS installations support Unix-style paths:

path:
  data: /var/data/elasticsearch
  logs: /var/log/elasticsearch

Windows installations support DOS paths with escaped backslashes:

path:
  data: "C:\\Elastic\\Elasticsearch\\data"
  logs: "C:\\Elastic\\Elasticsearch\\logs"

Don’t modify anything within the data directory or run processes that might interfere with its contents. If something other than Elasticsearch modifies the contents of the data directory, then Elasticsearch may fail, reporting corruption or other data inconsistencies, or may appear to work correctly having silently lost some of your data. Don’t attempt to take filesystem backups of the data directory; there is no supported way to restore such a backup. Instead, use Snapshot and restore to take backups safely. Don’t run virus scanners on the data directory. A virus scanner can prevent Elasticsearch from working correctly and may modify the contents of the data directory. The data directory contains no executables so a virus scan will only find false positives.

Multiple data paths

edit

Deprecated in 7.13.0.

If needed, you can specify multiple paths in path.data. Elasticsearch stores the node’s data across all provided paths but keeps each shard’s data on the same path.

Elasticsearch does not balance shards across a node’s data paths. High disk usage in a single path can trigger a high disk usage watermark for the entire node. If triggered, Elasticsearch will not add shards to the node, even if the node’s other paths have available disk space. If you need additional disk space, we recommend you add a new node rather than additional data paths.

Linux and macOS installations support multiple Unix-style paths in path.data:

path:
  data:
    - /mnt/elasticsearch_1
    - /mnt/elasticsearch_2
    - /mnt/elasticsearch_3

Windows installations support multiple DOS paths in path.data:

path:
  data:
    - "C:\\Elastic\\Elasticsearch_1"
    - "E:\\Elastic\\Elasticsearch_1"
    - "F:\\Elastic\\Elasticsearch_3"

Migrate from multiple data paths

edit

Support for multiple data paths was deprecated in 7.13 and will be removed in a future release.

As an alternative to multiple data paths, you can create a filesystem which spans multiple disks with a hardware virtualisation layer such as RAID, or a software virtualisation layer such as Logical Volume Manager (LVM) on Linux or Storage Spaces on Windows. If you wish to use multiple data paths on a single machine then you must run one node for each data path.

If you currently use multiple data paths in a highly available cluster then you can migrate to a setup that uses a single path for each node without downtime using a process similar to a rolling restart: shut each node down in turn and replace it with one or more nodes each configured to use a single data path. In more detail, for each node that currently has multiple data paths you should follow the following process. In principle you can perform this migration during a rolling upgrade to 8.0, but we recommend migrating to a single-data-path setup before starting to upgrade.

Take a snapshot to protect your data in case of disaster.

Optionally, migrate the data away from the target node by using an allocation filter:

response = client.cluster.put_settings(
  body: {
    persistent: {
      'cluster.routing.allocation.exclude._name' => 'target-node-name'
    }
  }
)
puts response

PUT _cluster/settings
{
  "persistent": {
    "cluster.routing.allocation.exclude._name": "target-node-name"
  }
}

Copy as curl Try in Elastic

You can use the cat allocation API to track progress of this data migration. If some shards do not migrate then the cluster allocation explain API will help you to determine why.

Follow the steps in the rolling restart process up to and including shutting the target node down.
Ensure your cluster health is yellow or green, so that there is a copy of every shard assigned to at least one of the other nodes in your cluster.

If applicable, remove the allocation filter applied in the earlier step.

response = client.cluster.put_settings(
  body: {
    persistent: {
      'cluster.routing.allocation.exclude._name' => nil
    }
  }
)
puts response

PUT _cluster/settings
{
  "persistent": {
    "cluster.routing.allocation.exclude._name": null
  }
}

Copy as curl Try in Elastic

Discard the data held by the stopped node by deleting the contents of its data paths.
Reconfigure your storage. For instance, combine your disks into a single filesystem using LVM or Storage Spaces. Ensure that your reconfigured storage has sufficient space for the data that it will hold.
Reconfigure your node by adjusting the path.data setting in its elasticsearch.yml file. If needed, install more nodes each with their own path.data setting pointing at a separate data path.
Start the new nodes and follow the rest of the rolling restart process for them.
Ensure your cluster health is green, so that every shard has been assigned.

You can alternatively add some number of single-data-path nodes to your cluster, migrate all your data over to these new nodes using allocation filters, and then remove the old nodes from the cluster. This approach will temporarily double the size of your cluster so it will only work if you have the capacity to expand your cluster like this.

If you currently use multiple data paths but your cluster is not highly available then you can migrate to a non-deprecated configuration by taking a snapshot, creating a new cluster with the desired configuration and restoring the snapshot into it.

Cluster name setting

edit

A node can only join a cluster when it shares its cluster.name with all the other nodes in the cluster. The default name is elasticsearch, but you should change it to an appropriate name that describes the purpose of the cluster.

cluster.name: logging-prod

Do not reuse the same cluster names in different environments. Otherwise, nodes might join the wrong cluster.

Changing the name of a cluster requires a full cluster restart.

Node name setting

edit

Elasticsearch uses node.name as a human-readable identifier for a particular instance of Elasticsearch. This name is included in the response of many APIs. The node name defaults to the hostname of the machine when Elasticsearch starts, but can be configured explicitly in elasticsearch.yml:

node.name: prod-data-2

Network host setting

edit

By default, Elasticsearch only binds to loopback addresses such as 127.0.0.1 and [::1]. This is sufficient to run a cluster of one or more nodes on a single server for development and testing, but a resilient production cluster must involve nodes on other servers. There are many network settings but usually all you need to configure is network.host:

network.host: 192.168.1.10

When you provide a value for network.host, Elasticsearch assumes that you are moving from development mode to production mode, and upgrades a number of system startup checks from warnings to exceptions. See the differences between development and production modes.

Discovery and cluster formation settings

edit

Configure two important discovery and cluster formation settings before going to production so that nodes in the cluster can discover each other and elect a master node.

`discovery.seed_hosts`

edit

Out of the box, without any network configuration, Elasticsearch will bind to the available loopback addresses and scan local ports 9300 to 9305 to connect with other nodes running on the same server. This behavior provides an auto-clustering experience without having to do any configuration.

When you want to form a cluster with nodes on other hosts, use the static discovery.seed_hosts setting. This setting provides a list of other nodes in the cluster that are master-eligible and likely to be live and contactable to seed the discovery process. This setting accepts a YAML sequence or array of the addresses of all the master-eligible nodes in the cluster. Each address can be either an IP address or a hostname that resolves to one or more IP addresses via DNS.

discovery.seed_hosts:
   - 192.168.1.10:9300
   - 192.168.1.11 
   - seeds.mydomain.com 
   - [0:0:0:0:0:ffff:c0a8:10c]:9301

	The port is optional and defaults to `9300`, but can be overridden.
	If a hostname resolves to multiple IP addresses, the node will attempt to discover other nodes at all resolved addresses.
	IPv6 addresses must be enclosed in square brackets.

If your master-eligible nodes do not have fixed names or addresses, use an alternative hosts provider to find their addresses dynamically.

`cluster.initial_master_nodes`

edit

When you start an Elasticsearch cluster for the first time, a cluster bootstrapping step determines the set of master-eligible nodes whose votes are counted in the first election. In development mode, with no discovery settings configured, this step is performed automatically by the nodes themselves.

Because auto-bootstrapping is inherently unsafe, when starting a new cluster in production mode, you must explicitly list the master-eligible nodes whose votes should be counted in the very first election. You set this list using the cluster.initial_master_nodes setting.

After the cluster forms successfully for the first time, remove the cluster.initial_master_nodes setting from each node’s configuration. Do not use this setting when restarting a cluster or adding a new node to an existing cluster.

discovery.seed_hosts:
   - 192.168.1.10:9300
   - 192.168.1.11
   - seeds.mydomain.com
   - [0:0:0:0:0:ffff:c0a8:10c]:9301
cluster.initial_master_nodes: 
   - master-node-a
   - master-node-b
   - master-node-c

Identify the initial master nodes by their node.name, which defaults to their hostname. Ensure that the value in cluster.initial_master_nodes matches the node.name exactly. If you use a fully-qualified domain name (FQDN) such as master-node-a.example.com for your node names, then you must use the FQDN in this list. Conversely, if node.name is a bare hostname without any trailing qualifiers, you must also omit the trailing qualifiers in cluster.initial_master_nodes.

See bootstrapping a cluster and discovery and cluster formation settings.

Heap size settings

edit

By default, Elasticsearch automatically sets the JVM heap size based on a node’s roles and total memory. We recommend the default sizing for most production environments.

If needed, you can override the default sizing by manually setting the JVM heap size.

JVM heap dump path setting

edit

By default, Elasticsearch configures the JVM to dump the heap on out of memory exceptions to the default data directory. On RPM and Debian packages, the data directory is /var/lib/elasticsearch. On Linux and MacOS and Windows distributions, the data directory is located under the root of the Elasticsearch installation.

If this path is not suitable for receiving heap dumps, modify the -XX:HeapDumpPath=... entry in jvm.options:

If you specify a directory, the JVM will generate a filename for the heap dump based on the PID of the running instance.
If you specify a fixed filename instead of a directory, the file must not exist when the JVM needs to perform a heap dump on an out of memory exception. Otherwise, the heap dump will fail.

GC logging settings

edit

By default, Elasticsearch enables garbage collection (GC) logs. These are configured in jvm.options and output to the same default location as the Elasticsearch logs. The default configuration rotates the logs every 64 MB and can consume up to 2 GB of disk space.

You can reconfigure JVM logging using the command line options described in JEP 158: Unified JVM Logging. Unless you change the default jvm.options file directly, the Elasticsearch default configuration is applied in addition to your own settings. To disable the default configuration, first disable logging by supplying the -Xlog:disable option, then supply your own command line options. This disables all JVM logging, so be sure to review the available options and enable everything that you require.

To see further options not contained in the original JEP, see Enable Logging with the JVM Unified Logging Framework.

Examples

edit

Change the default GC log output location to /opt/my-app/gc.log by creating $ES_HOME/config/jvm.options.d/gc.options with some sample options:

# Turn off all previous logging configuratons
-Xlog:disable

# Default settings from JEP 158, but with `utctime` instead of `uptime` to match the next line
-Xlog:all=warning:stderr:utctime,level,tags

# Enable GC logging to a custom location with a variety of options
-Xlog:gc*,gc+age=trace,safepoint:file=/opt/my-app/gc.log:utctime,level,pid,tags:filecount=32,filesize=64m

Configure an Elasticsearch Docker container to send GC debug logs to standard error (stderr). This lets the container orchestrator handle the output. If using the ES_JAVA_OPTS environment variable, specify:

MY_OPTS="-Xlog:disable -Xlog:all=warning:stderr:utctime,level,tags -Xlog:gc=debug:stderr:utctime"
docker run -e ES_JAVA_OPTS="$MY_OPTS" # etc

Temporary directory settings

edit

By default, Elasticsearch uses a private temporary directory that the startup script creates immediately below the system temporary directory.

On some Linux distributions, a system utility will clean files and directories from /tmp if they have not been recently accessed. This behavior can lead to the private temporary directory being removed while Elasticsearch is running if features that require the temporary directory are not used for a long time. Removing the private temporary directory causes problems if a feature that requires this directory is subsequently used.

If you install Elasticsearch using the .deb or .rpm packages and run it under systemd, the private temporary directory that Elasticsearch uses is excluded from periodic cleanup.

If you intend to run the .tar.gz distribution on Linux or MacOS for an extended period, consider creating a dedicated temporary directory for Elasticsearch that is not under a path that will have old files and directories cleaned from it. This directory should have permissions set so that only the user that Elasticsearch runs as can access it. Then, set the $ES_TMPDIR environment variable to point to this directory before starting Elasticsearch.

JVM fatal error log setting

edit

By default, Elasticsearch configures the JVM to write fatal error logs to the default logging directory. On RPM and Debian packages, this directory is /var/log/elasticsearch. On Linux and MacOS and Windows distributions, the logs directory is located under the root of the Elasticsearch installation.

These are logs produced by the JVM when it encounters a fatal error, such as a segmentation fault. If this path is not suitable for receiving logs, modify the -XX:ErrorFile=... entry in jvm.options.

Cluster backups

edit

In a disaster, snapshots can prevent permanent data loss. Snapshot lifecycle management is the easiest way to take regular backups of your cluster. For more information, see Create a snapshot.

Taking a snapshot is the only reliable and supported way to back up a cluster. You cannot back up an Elasticsearch cluster by making copies of the data directories of its nodes. There are no supported methods to restore any data from a filesystem-level backup. If you try to restore a cluster from such a backup, it may fail with reports of corruption or missing files or other data inconsistencies, or it may appear to have succeeded having silently lost some of your data.

« Configuring Elasticsearch Secure settings »

On this page

Path settings
Multiple data paths
Migrate from multiple data paths
Cluster name setting
Node name setting
Network host setting
Discovery and cluster formation settings
discovery.seed_hosts
cluster.initial_master_nodes
Heap size settings
JVM heap dump path setting
GC logging settings
Examples
Temporary directory settings
JVM fatal error log setting
Cluster backups

Was this helpful?

Feedback

The Search AI Company

Generative AI

Search

Security

Observability

By solution

Industries

Important Elasticsearch configuration

Important Elasticsearch configuration

Path settings

Multiple data paths

Migrate from multiple data paths

Cluster name setting

Node name setting

Network host setting

Discovery and cluster formation settings

discovery.seed_hosts

cluster.initial_master_nodes

Heap size settings

JVM heap dump path setting

GC logging settings

Examples

Temporary directory settings

JVM fatal error log setting

Cluster backups

Follow us

About us

Join us

Partners

Trust & Security

Investor relations

Excellence Awards

`discovery.seed_hosts`

`cluster.initial_master_nodes`