- Elasticsearch Guide: other versions:
- Getting Started
- Setup
- Breaking changes
- API Conventions
- Document APIs
- Search APIs
- Search
- URI Search
- Request Body Search
- Search Template
- Search Shards API
- Aggregations
- Min Aggregation
- Max Aggregation
- Sum Aggregation
- Avg Aggregation
- Stats Aggregation
- Extended Stats Aggregation
- Value Count Aggregation
- Percentiles Aggregation
- Percentile Ranks Aggregation
- Cardinality Aggregation
- Geo Bounds Aggregation
- Top hits Aggregation
- Scripted Metric Aggregation
- Global Aggregation
- Filter Aggregation
- Filters Aggregation
- Missing Aggregation
- Nested Aggregation
- Reverse nested Aggregation
- Children Aggregation
- Terms Aggregation
- Significant Terms Aggregation
- Range Aggregation
- Date Range Aggregation
- IPv4 Range Aggregation
- Histogram Aggregation
- Date Histogram Aggregation
- Geo Distance Aggregation
- GeoHash grid Aggregation
- Facets
- Suggesters
- Multi Search API
- Count API
- Search Exists API
- Validate API
- Explain API
- Percolator
- More Like This API
- Field stats API
- Indices APIs
- Create Index
- Delete Index
- Get Index
- Indices Exists
- Open / Close Index API
- Put Mapping
- Get Mapping
- Get Field Mapping
- Types Exists
- Delete Mapping
- Index Aliases
- Update Indices Settings
- Get Settings
- Analyze
- Index Templates
- Warmers
- Status
- Indices Stats
- Indices Segments
- Indices Recovery
- Clear Cache
- Flush
- Refresh
- Optimize
- Shadow replica indices
- Upgrade
- cat APIs
- Cluster APIs
- Query DSL
- Queries
- Match Query
- Multi Match Query
- Bool Query
- Boosting Query
- Common Terms Query
- Constant Score Query
- Dis Max Query
- Filtered Query
- Fuzzy Like This Query
- Fuzzy Like This Field Query
- Function Score Query
- Fuzzy Query
- GeoShape Query
- Has Child Query
- Has Parent Query
- Ids Query
- Indices Query
- Match All Query
- More Like This Query
- Nested Query
- Prefix Query
- Query String Query
- Simple Query String Query
- Range Query
- Regexp Query
- Span First Query
- Span Multi Term Query
- Span Near Query
- Span Not Query
- Span Or Query
- Span Term Query
- Term Query
- Terms Query
- Top Children Query
- Wildcard Query
- Minimum Should Match
- Multi Term Query Rewrite
- Template Query
- Filters
- And Filter
- Bool Filter
- Exists Filter
- Geo Bounding Box Filter
- Geo Distance Filter
- Geo Distance Range Filter
- Geo Polygon Filter
- GeoShape Filter
- Geohash Cell Filter
- Has Child Filter
- Has Parent Filter
- Ids Filter
- Indices Filter
- Limit Filter
- Match All Filter
- Missing Filter
- Nested Filter
- Not Filter
- Or Filter
- Prefix Filter
- Query Filter
- Range Filter
- Regexp Filter
- Script Filter
- Term Filter
- Terms Filter
- Type Filter
- Queries
- Mapping
- Analysis
- Analyzers
- Tokenizers
- Token Filters
- Standard Token Filter
- ASCII Folding Token Filter
- Length Token Filter
- Lowercase Token Filter
- Uppercase Token Filter
- NGram Token Filter
- Edge NGram Token Filter
- Porter Stem Token Filter
- Shingle Token Filter
- Stop Token Filter
- Word Delimiter Token Filter
- Stemmer Token Filter
- Stemmer Override Token Filter
- Keyword Marker Token Filter
- Keyword Repeat Token Filter
- KStem Token Filter
- Snowball Token Filter
- Phonetic Token Filter
- Synonym Token Filter
- Compound Word Token Filter
- Reverse Token Filter
- Elision Token Filter
- Truncate Token Filter
- Unique Token Filter
- Pattern Capture Token Filter
- Pattern Replace Token Filter
- Trim Token Filter
- Limit Token Count Token Filter
- Hunspell Token Filter
- Common Grams Token Filter
- Normalization Token Filter
- CJK Width Token Filter
- CJK Bigram Token Filter
- Delimited Payload Token Filter
- Keep Words Token Filter
- Keep Types Token Filter
- Classic Token Filter
- Apostrophe Token Filter
- Character Filters
- ICU Analysis Plugin
- Modules
- Index Modules
- Testing
- Glossary of terms
WARNING: Version 1.7 of Elasticsearch has passed its EOL date.
This documentation is no longer being maintained and may be removed. If you are running this version, we strongly advise you to upgrade. For the latest information, see the current release documentation.
Configuration
editConfiguration
editEnvironment Variables
editWithin the scripts, Elasticsearch comes with built in JAVA_OPTS
passed
to the JVM started. The most important setting for that is the -Xmx
to
control the maximum allowed memory for the process, and -Xms
to
control the minimum allocated memory for the process (in general, the
more memory allocated to the process, the better).
Most times it is better to leave the default JAVA_OPTS
as they are,
and use the ES_JAVA_OPTS
environment variable in order to set / change
JVM settings or arguments.
The ES_HEAP_SIZE
environment variable allows to set the heap memory
that will be allocated to elasticsearch java process. It will allocate
the same value to both min and max values, though those can be set
explicitly (not recommended) by setting ES_MIN_MEM
(defaults to
256m
), and ES_MAX_MEM
(defaults to 1g
).
It is recommended to set the min and max memory to the same value, and
enable mlockall
.
System Configuration
editFile Descriptors
editMake sure to increase the number of open files descriptors on the machine (or for the user running elasticsearch). Setting it to 32k or even 64k is recommended.
In order to test how many open files the process can open, start it with
-Des.max-open-files
set to true
. This will print the number of open
files the process can open on startup.
Alternatively, you can retrieve the max_file_descriptors
for each node
using the Nodes Info API, with:
curl localhost:9200/_nodes/process?pretty
Virtual memory
editElasticsearch uses a hybrid mmapfs / niofs
directory by default to store its indices. The default
operating system limits on mmap counts is likely to be too low, which may
result in out of memory exceptions. On Linux, you can increase the limits by
running the following command as root
:
sysctl -w vm.max_map_count=262144
To set this value permanently, update the vm.max_map_count
setting in
/etc/sysctl.conf
.
If you installed Elasticsearch using a package (.deb, .rpm) this setting will be changed automatically. To verify, run sysctl vm.max_map_count
.
Memory Settings
editMost operating systems try to use as much memory as possible for file system caches and eagerly swap out unused application memory, possibly resulting in the elasticsearch process being swapped. Swapping is very bad for performance and for node stability, so it should be avoided at all costs.
There are three options:
-
Disable swap
The simplest option is to completely disable swap. Usually Elasticsearch is the only service running on a box, and its memory usage is controlled by the
ES_HEAP_SIZE
environment variable. There should be no need to have swap enabled.On Linux systems, you can disable swap temporarily by running:
sudo swapoff -a
. To disable it permanently, you will need to edit the/etc/fstab
file and comment out any lines that contain the wordswap
.On Windows, the equivalent can be achieved by disabling the paging file entirely via
System Properties → Advanced → Performance → Advanced → Virtual memory
. -
Configure
swappiness
The second option is to ensure that the sysctl value
vm.swappiness
is set to0
. This reduces the kernel’s tendency to swap and should not lead to swapping under normal circumstances, while still allowing the whole system to swap in emergency conditions.From kernel version 3.5-rc1 and above, a
swappiness
of0
will cause the OOM killer to kill the process instead of allowing swapping. You will need to setswappiness
to1
to still allow swapping in emergencies. -
mlockall
The third option is to use mlockall on Linux/Unix systems, or VirtualLock on Windows, to try to lock the process address space into RAM, preventing any Elasticsearch memory from being swapped out. This can be done, by adding this line to the
config/elasticsearch.yml
file:bootstrap.mlockall: true
After starting Elasticsearch, you can see whether this setting was applied successfully by checking the value of
mlockall
in the output from this request:curl http://localhost:9200/_nodes/process?pretty
If you see that
mlockall
isfalse
, then it means that the themlockall
request has failed. The most probable reason, on Linux/Unix systems, is that the user running Elasticsearch doesn’t have permission to lock memory. This can be granted by runningulimit -l unlimited
asroot
before starting Elasticsearch.Another possible reason why
mlockall
can fail is that the temporary directory (usually/tmp
) is mounted with thenoexec
option. This can be solved by specifying a new temp directory, by starting Elasticsearch with:./bin/elasticsearch -Djna.tmpdir=/path/to/new/dir
mlockall
might cause the JVM or shell session to exit if it tries to allocate more memory than is available!
Elasticsearch Settings
editelasticsearch configuration files can be found under ES_HOME/config
folder. The folder comes with two files, the elasticsearch.yml
for
configuring Elasticsearch different
modules, and logging.yml
for
configuring the Elasticsearch logging.
The configuration format is YAML. Here is an example of changing the address all network based modules will use to bind and publish to:
network : host : 10.0.0.4
Paths
editIn production use, you will almost certainly want to change paths for data and log files:
path: logs: /var/log/elasticsearch data: /var/data/elasticsearch
Cluster name
editAlso, don’t forget to give your production cluster a name, which is used to discover and auto-join other nodes:
cluster: name: <NAME OF YOUR CLUSTER>
Make sure that you don’t reuse the same cluster names in different
environments, otherwise you might end up with nodes joining the wrong cluster.
For instance you could use logging-dev
, logging-stage
, and logging-prod
for the development, staging, and production clusters.
Node name
editYou may also want to change the default node name for each node to something like the display hostname. By default Elasticsearch will randomly pick a Marvel character name from a list of around 3000 names when your node starts up.
node: name: <NAME OF YOUR NODE>
The hostname of the machine is provided in the environment
variable HOSTNAME
. If on your machine you only run a
single elasticsearch node for that cluster, you can set
the node name to the hostname using the ${...}
notation:
node: name: ${HOSTNAME}
Internally, all settings are collapsed into "namespaced" settings. For
example, the above gets collapsed into node.name
. This means that
its easy to support other configuration formats, for example,
JSON. If JSON is a preferred configuration format,
simply rename the elasticsearch.yml
file to elasticsearch.json
and
add:
Configuration styles
edit{ "network" : { "host" : "10.0.0.4" } }
It also means that its easy to provide the settings externally either
using the ES_JAVA_OPTS
or as parameters to the elasticsearch
command, for example:
$ elasticsearch -Des.network.host=10.0.0.4
Another option is to set es.default.
prefix instead of es.
prefix,
which means the default setting will be used only if not explicitly set
in the configuration file.
Another option is to use the ${...}
notation within the configuration
file which will resolve to an environment setting, for example:
{ "network" : { "host" : "${ES_NET_HOST}" } }
Additionally, for settings that you do not wish to store in the configuration
file, you can use the value ${prompt.text}
or ${prompt.secret}
and start
Elasticsearch in the foreground. ${prompt.secret}
has echoing disabled so
that the value entered will not be shown in your terminal; ${prompt.text}
will allow you to see the value as you type it in. For example:
node: name: ${prompt.text}
On execution of the elasticsearch
command, you will be prompted to enter
the actual value like so:
Enter value for [node.name]:
Elasticsearch will not start if ${prompt.text}
or ${prompt.secret}
is used in the settings and the process is run as a service or in the background.
The location of the configuration file can be set externally using a system property:
$ elasticsearch -Des.config=/path/to/config/file
Index Settings
editIndices created within the cluster can provide their own settings. For example, the following creates an index with memory based storage instead of the default file system based one (the format can be either YAML or JSON):
$ curl -XPUT http://localhost:9200/kimchy/ -d \ ' index : store: type: memory '
Index level settings can be set on the node level as well, for example,
within the elasticsearch.yml
file, the following can be set:
index : store: type: memory
This means that every index that gets created on the specific node started with the mentioned configuration will store the index in memory unless the index explicitly sets it. In other words, any index level settings override what is set in the node configuration. Of course, the above can also be set as a "collapsed" setting, for example:
$ elasticsearch -Des.index.store.type=memory
All of the index level configuration can be found within each index module.
Logging
editElasticsearch uses an internal logging abstraction and comes, out of the
box, with log4j. It tries to simplify
log4j configuration by using YAML to configure it,
and the logging configuration file is config/logging.yml
. The
JSON and
properties formats are also
supported. Multiple configuration files can be loaded, in which case they will
get merged, as long as they start with the logging.
prefix and end with one
of the supported suffixes (either .yml
, .yaml
, .json
or .properties
)
The logger section contains the java packages and their corresponding log
level, where it is possible to omit the org.elasticsearch
prefix. The
appender section contains the destinations for the logs. Extensive information
on how to customize logging and all the supported appenders can be found on
the log4j documentation.
Additional Appenders and other logging classes provided by log4j-extras are also available, out of the box.
On this page