WARNING: Version 1.6 of Elasticsearch has passed its EOL date.
This documentation is no longer being maintained and may be removed. If you are running this version, we strongly advise you to upgrade. For the latest information, see the current release documentation.
Scripting
editScripting
editThe scripting module allows to use scripts in order to evaluate custom expressions. For example, scripts can be used to return "script fields" as part of a search request, or can be used to evaluate a custom score for a query and so on.
The scripting module uses by default groovy as the scripting language with some extensions. Groovy is used since it is extremely fast and very simple to use.
Change in Groovy casting behavour
The Groovy upgrade in Elasticsearch 1.6.1 fixed a casting bug, which may
change the behaviour of existing scripts which relied on this bug. For
instance, before the upgrade, the following script would have returned a
long
value:
(long) _value / 1000 + 1
After the upgrade, the above script will now return a float
value. To
return a long
value, the script should be rewritten as follows:
(long) (_value / 1000 + 1)
See GROOVY-4421 and GROVY-5185 for more information.
Groovy dynamic scripting disabled by default from v1.4.3
Elasticsearch versions 1.3.0-1.3.7 and 1.4.0-1.4.2 have a vulnerability in the Groovy scripting engine. The vulnerability allows an attacker to construct Groovy scripts that escape the sandbox and execute shell commands as the user running the Elasticsearch Java VM.
If you are running a vulnerable version of Elasticsearch, you should either
upgrade to at least v1.3.8 or v1.4.3, or disable dynamic Groovy scripts by
adding this setting to the config/elasticsearch.yml
file in all nodes in the
cluster:
script.groovy.sandbox.enabled: false
This will turn off the Groovy sandbox, thus preventing dynamic Groovy scripts
from being accepted as part of a request or retrieved from the special
.scripts
index. You will still be able to use Groovy scripts stored in files
in the config/scripts/
directory on every node.
To convert an inline script to a file, take this simple script as an example:
GET /_search { "script_fields": { "my_field": { "script": "1 + my_var", "params": { "my_var": 2 } } } }
Save the contents of the script as a file called config/scripts/my_script.groovy
on every data node in the cluster:
1 + my_var
Now you can access the script by file name (without the extension):
GET /_search { "script_fields": { "my_field": { "script_file": "my_script", "params": { "my_var": 2 } } } }
Additional lang
plugins are provided to allow to execute scripts in
different languages. All places where a script
parameter can be used, a lang
parameter
(on the same level) can be provided to define the language of the
script. The following are the supported scripting languages:
Language | Sandboxed | Required plugin |
---|---|---|
groovy |
no |
built-in |
expression |
yes |
built-in |
mustache |
yes |
built-in |
mvel |
no |
|
javascript |
no |
|
python |
no |
To increase security, Elasticsearch does not allow you to specify scripts for
non-sandboxed languages with a request. Instead, scripts must be placed in the
scripts
directory inside the configuration directory (the directory where
elasticsearch.yml is). Scripts placed into this directory will automatically be
picked up and be available to be used. Once a script has been placed in this
directory, it can be referenced by name. For example, a script called
calculate-score.groovy
can be referenced in a request like this:
$ tree config config ├── elasticsearch.yml ├── logging.yml └── scripts └── calculate-score.groovy
$ cat config/scripts/calculate-score.groovy log(_score * 2) + my_modifier
curl -XPOST localhost:9200/_search -d '{ "query": { "function_score": { "query": { "match": { "body": "foo" } }, "functions": [ { "script_score": { "lang": "groovy", "script_file": "calculate-score", "params": { "my_modifier": 8 } } } ] } } }'
The name of the script is derived from the hierarchy of directories it
exists under, and the file name without the lang extension. For example,
a script placed under config/scripts/group1/group2/test.py
will be
named group1_group2_test
.
Indexed Scripts
editElasticsearch allows you to store scripts in an internal index known as
.scripts
and reference them by id. There are REST endpoints to manage
indexed scripts as follows:
Requests to the scripts endpoint look like :
/_scripts/{lang}/{id}
Where the lang
part is the language the script is in and the id
part is the id
of the script. In the .scripts
index the type of the document will be set to the lang
.
curl -XPOST localhost:9200/_scripts/groovy/indexedCalculateScore -d '{ "script": "log(_score * 2) + my_modifier" }'
This will create a document with id: indexedCalculateScore
and type: groovy
in the
.scripts
index. The type of the document is the language used by the script.
This script can be accessed at query time by appending _id
to
the script parameter and passing the script id. So script
becomes script_id
.:
curl -XPOST localhost:9200/_search -d '{ "query": { "function_score": { "query": { "match": { "body": "foo" } }, "functions": [ { "script_score": { "script_id": "indexedCalculateScore", "lang" : "groovy", "params": { "my_modifier": 8 } } } ] } } }'
The script can be viewed by:
curl -XGET localhost:9200/_scripts/groovy/indexedCalculateScore
This is rendered as:
'{ "script": "log(_score * 2) + my_modifier" }'
Indexed scripts can be deleted by:
curl -XDELETE localhost:9200/_scripts/groovy/indexedCalculateScore
Enabling dynamic scripting
editWe recommend running Elasticsearch behind an application or proxy, which protects Elasticsearch from the outside world. If users are allowed to run inline scripts (even in a search request) or indexed scripts, then they have the same access to your box as the user that Elasticsearch is running as. For this reason dynamic scripting is allowed only for sandboxed languages by default.
First, you should not run Elasticsearch as the root
user, as this would allow
a script to access or do anything on your server, without limitations. Second,
you should not expose Elasticsearch directly to users, but instead have a proxy
application inbetween. If you do intend to expose Elasticsearch directly to
your users, then you have to decide whether you trust them enough to run scripts
on your box or not.
Deprecated in 1.6.0.
the script.disable_dynamic
setting is deprecated in favour of fine-grained settings described as follows
Added in 1.6.0.
Fine-grained script settings replace the script.disable_dynamic
setting
It is possible to enable scripts based on their source, for
every script engine, through the following settings that need to be added to the
config/elasticsearch.yml
file on every node.
script.inline: on script.indexed: on
While this still allows execution of named scripts provided in the config, or
native Java scripts registered through plugins, it also allows users to run
arbitrary scripts via the API. Instead of sending the name of the file as the
script, the body of the script can be sent instead or retrieved from the
.scripts
indexed if previously stored.
There are three possible configuration values for any of the fine-grained script settings:
Value | Description |
---|---|
|
scripting is turned off completely, in the context of the setting being set. |
|
scripting is turned on, in the context of the setting being set. |
|
scripts may be executed only for languages that are sandboxed |
The default values are the following:
script.inline: sandbox script.indexed: sandbox script.file: on
Global scripting settings affect the mustache
scripting language.
Search templates internally use the mustache
language,
and will still be enabled by default as the mustache
engine is sandboxed,
but they will be enabled/disabled according to fine-grained settings
specified in elasticsearch.yml
.
[1.6.0]
Added in 1.6.0. mustache
scripts were previously always on regardless of whether dynamic scripts were enabled or not
It is also possible to control which operations can execute scripts. The supported operations are:
Value | Description |
---|---|
|
Aggregations (wherever they may be used) |
|
Mappings (script transform feature) |
|
Search api, Percolator api and Suggester api (e.g filters, script_fields) |
|
Update api |
|
Any plugin that makes use of scripts under the generic |
Plugins can also define custom operations that they use scripts for instead
of using the generic plugin
category. Those operations can be referred to
in the following form: ${pluginName}_${operation}
.
The following example disables scripting for update
and mapping
operations,
regardless of the script source, for any engine. Scripts can still be
executed from sandboxed languages as part of aggregations
, search
and plugins execution though, as the above defaults still get applied.
script.update: off script.mapping: off
Generic settings get applied in order, operation based ones have precedence
over source based ones. Language specific settings are supported too. They
need to be prefixed with the script.engine.<engine>
prefix and have
precedence over any other generic settings.
script.engine.groovy.file.aggs: on script.engine.groovy.file.mapping: on script.engine.groovy.file.search: on script.engine.groovy.file.update: on script.engine.groovy.file.plugin: on script.engine.groovy.indexed.aggs: on script.engine.groovy.indexed.mapping: off script.engine.groovy.indexed.search: on script.engine.groovy.indexed.update: off script.engine.groovy.indexed.plugin: off script.engine.groovy.inline.aggs: on script.engine.groovy.inline.mapping: off script.engine.groovy.inline.search: off script.engine.groovy.inline.update: off script.engine.groovy.inline.plugin: off
Default Scripting Language
editThe default scripting language (assuming no lang
parameter is provided) is
groovy
. In order to change it, set the script.default_lang
to the
appropriate language.
Groovy Sandboxing
editDeprecated in 1.6.0.
Groovy sandboxing has been disabled by default since 1.4.3 because it has proved to be ineffective. It will be removed completely in 2.0.0"
Elasticsearch sandboxes Groovy scripts that are compiled and executed in order to ensure they don’t perform unwanted actions. There are a number of options that can be used for configuring this sandbox:
-
script.groovy.sandbox.receiver_whitelist
- Comma-separated list of string classes for objects that may have methods invoked.
-
script.groovy.sandbox.package_whitelist
- Comma-separated list of packages under which new objects may be constructed.
-
script.groovy.sandbox.class_whitelist
- Comma-separated list of classes that are allowed to be constructed.
-
script.groovy.sandbox.method_blacklist
- Comma-separated list of methods that are never allowed to be invoked, regardless of target object.
-
script.groovy.sandbox.enabled
-
Flag to enable the sandbox (defaults to
false
meaning the sandbox is disabled).
When specifying whitelist or blacklist settings for the groovy sandbox, all options replace the current whitelist, they are not additive.
Automatic Script Reloading
editThe config/scripts
directory is scanned periodically for changes.
New and changed scripts are reloaded and deleted script are removed
from preloaded scripts cache. The reload frequency can be specified
using watcher.interval
setting, which defaults to 60s
.
To disable script reloading completely set script.auto_reload_enabled
to false
.
Native (Java) Scripts
editEven though groovy
is pretty fast, this allows to register native Java based
scripts for faster execution.
In order to allow for scripts, the NativeScriptFactory
needs to be
implemented that constructs the script that will be executed. There are
two main types, one that extends AbstractExecutableScript
and one that
extends AbstractSearchScript
(probably the one most users will extend,
with additional helper classes in AbstractLongSearchScript
,
AbstractDoubleSearchScript
, and AbstractFloatSearchScript
).
Registering them can either be done by settings, for example:
script.native.my.type
set to sample.MyNativeScriptFactory
will
register a script named my
. Another option is in a plugin, access
ScriptModule
and call registerScript
on it.
Executing the script is done by specifying the lang
as native
, and
the name of the script as the script
.
Note, the scripts need to be in the classpath of elasticsearch. One simple way to do it is to create a directory under plugins (choose a descriptive name), and place the jar / classes files there. They will be automatically loaded.
Lucene Expressions Scripts
editThis feature is experimental and subject to change in future versions.
Lucene’s expressions module provides a mechanism to compile a
javascript
expression to bytecode. This allows very fast execution,
as if you had written a native
script. Expression scripts can be
used in script_score
, script_fields
, sort scripts and numeric aggregation scripts.
See the expressions module documentation for details on what operators and functions are available.
Variables in expression
scripts are available to access:
-
Single valued document fields, e.g.
doc['myfield'].value
-
Parameters passed into the script, e.g.
mymodifier
-
The current document’s score,
_score
(only available when used in ascript_score
)
There are a few limitations relative to other script languages:
- Only numeric fields may be accessed
- Stored fields are not available
-
If a field is sparse (only some documents contain a value), documents missing the field will have a value of
0
Score
editIn all scripts that can be used in aggregations, the current
document’s score is accessible in _score
.
Computing scores based on terms in scripts
editsee advanced scripting documentation
Document Fields
editMost scripting revolve around the use of specific document fields data.
The doc['field_name']
can be used to access specific field data within
a document (the document in question is usually derived by the context
the script is used). Document fields are very fast to access since they
end up being loaded into memory (all the relevant field values/tokens
are loaded to memory). Note, however, that the doc[...]
notation only
allows for simple valued fields (can’t return a json object from it)
and makes sense only on non-analyzed or single term based fields.
The following data can be extracted from a field:
Expression | Description |
---|---|
|
The native value of the field. For example, if its a short type, it will be short. |
|
The native array values of the field. For example, if its a short type, it will be short[]. Remember, a field can have several values within a single doc. Returns an empty array if the field has no values. |
|
A boolean indicating if the field has no values within the doc. |
|
A boolean indicating that the field has several values within the corpus. |
|
The latitude of a geo point type. |
|
The longitude of a geo point type. |
|
The latitudes of a geo point type. |
|
The longitudes of a geo point type. |
|
The |
|
The |
|
The |
|
The |
|
The |
|
The |
|
The |
|
The |
|
The |
|
The |
|
The |
|
The |
|
The distance factor of this geo point field from the provided lat/lon. |
|
The distance factor of this geo point field from the provided lat/lon with a default value. |
|
The |
|
The |
|
The |
Stored Fields
editStored fields can also be accessed when executing a script. Note, they
are much slower to access compared with document fields, as they are not
loaded into memory. They can be simply accessed using
_fields['my_field_name'].value
or _fields['my_field_name'].values
.
Accessing the score of a document within a script
editWhen using scripting for calculating the score of a document (for instance, with
the function_score
query), you can access the score using the _score
variable inside of a Groovy script.
Source Field
editThe source field can also be accessed when executing a script. The
source field is loaded per doc, parsed, and then provided to the script
for evaluation. The _source
forms the context under which the source
field can be accessed, for example _source.obj2.obj1.field3
.
Accessing _source
is much slower compared to using doc
but the data is not loaded into memory. For a single field access _fields
may be
faster than using _source
due to the extra overhead of potentially parsing large documents.
However, _source
may be faster if you access multiple fields or if the source has already been
loaded for other purposes.
Groovy Built In Functions
editThere are several built in functions that can be used within scripts. They include:
Function | Description |
---|---|
|
Returns the trigonometric sine of an angle. |
|
Returns the trigonometric cosine of an angle. |
|
Returns the trigonometric tangent of an angle. |
|
Returns the arc sine of a value. |
|
Returns the arc cosine of a value. |
|
Returns the arc tangent of a value. |
|
Converts an angle measured in degrees to an approximately equivalent angle measured in radians |
|
Converts an angle measured in radians to an approximately equivalent angle measured in degrees. |
|
Returns Euler’s number e raised to the power of value. |
|
Returns the natural logarithm (base e) of a value. |
|
Returns the base 10 logarithm of a value. |
|
Returns the correctly rounded positive square root of a value. |
|
Returns the cube root of a double value. |
|
Computes the remainder operation on two arguments as prescribed by the IEEE 754 standard. |
|
Returns the smallest (closest to negative infinity) value that is greater than or equal to the argument and is equal to a mathematical integer. |
|
Returns the largest (closest to positive infinity) value that is less than or equal to the argument and is equal to a mathematical integer. |
|
Returns the value that is closest in value to the argument and is equal to a mathematical integer. |
|
Returns the angle theta from the conversion of rectangular coordinates (x, y) to polar coordinates (r,theta). |
|
Returns the value of the first argument raised to the power of the second argument. |
|
Returns the closest int to the argument. |
|
Returns a random double value. |
|
Returns the absolute value of a value. |
|
Returns the greater of two values. |
|
Returns the smaller of two values. |
|
Returns the size of an ulp of the argument. |
|
Returns the signum function of the argument. |
|
Returns the hyperbolic sine of a value. |
|
Returns the hyperbolic cosine of a value. |
|
Returns the hyperbolic tangent of a value. |
|
Returns sqrt(x2 + y2) without intermediate overflow or underflow. |
Arithmetic precision in MVEL
editWhen dividing two numbers using MVEL based scripts, the engine tries to
be smart and adheres to the default behaviour of java. This means if you
divide two integers (you might have configured the fields as integer in
the mapping), the result will also be an integer. This means, if a
calculation like 1/num
is happening in your scripts and num
is an
integer with the value of 8
, the result is 0
even though you were
expecting it to be 0.125
. You may need to enforce precision by
explicitly using a double like 1.0/num
in order to get the expected
result.