Scripting

edit

The scripting module allows to use scripts in order to evaluate custom expressions. For example, scripts can be used to return "script fields" as part of a search request, or can be used to evaluate a custom score for a query and so on.

The scripting module uses by default groovy (previously mvel in 1.3.x and earlier) as the scripting language with some extensions. Groovy is used since it is extremely fast and very simple to use.

Groovy dynamic scripting off by default from v1.4.3

Groovy dynamic scripting is off by default, preventing dynamic Groovy scripts from being accepted as part of a request or retrieved from the special .scripts index. You will still be able to use Groovy scripts stored in files in the config/scripts/ directory on every node.

To convert an inline script to a file, take this simple script as an example:

GET /_search
{
  "script_fields": {
    "my_field": {
      "script": {
        "inline": "1 + my_var",
        "params": {
          "my_var": 2
        }
      }
    }
  }
}

Save the contents of the inline field as a file called config/scripts/my_script.groovy on every data node in the cluster:

1 + my_var

Now you can access the script by file name (without the extension):

GET /_search
{
  "script_fields": {
    "my_field": {
      "script": {
        "file": "my_script",
        "params": {
          "my_var": 2
        }
      }
    }
  }
}

Additional lang plugins are provided to allow to execute scripts in different languages. All places where a script can be used, a lang parameter can be provided to define the language of the script. The following are the supported scripting languages:

Language Sandboxed Required plugin

groovy

no

built-in

expression

yes

built-in

mustache

yes

built-in

javascript

no

elasticsearch-lang-javascript

python

no

elasticsearch-lang-python

To increase security, Elasticsearch does not allow you to specify scripts for non-sandboxed languages with a request. Instead, scripts must be placed in the scripts directory inside the configuration directory (the directory where elasticsearch.yml is). The default location of this scripts directory can be changed by setting path.scripts in elasticsearch.yml. Scripts placed into this directory will automatically be picked up and be available to be used. Once a script has been placed in this directory, it can be referenced by name. For example, a script called calculate-score.groovy can be referenced in a request like this:

$ tree config
config
├── elasticsearch.yml
├── logging.yml
└── scripts
    └── calculate-score.groovy
$ cat config/scripts/calculate-score.groovy
log(_score * 2) + my_modifier
curl -XPOST localhost:9200/_search -d '{
  "query": {
    "function_score": {
      "query": {
        "match": {
          "body": "foo"
        }
      },
      "functions": [
        {
          "script_score": {
            "script": {
              "lang": "groovy",
              "file": "calculate-score",
              "params": {
                "my_modifier": 8
              }
            }
          }
        }
      ]
    }
  }
}'

The name of the script is derived from the hierarchy of directories it exists under, and the file name without the lang extension. For example, a script placed under config/scripts/group1/group2/test.py will be named group1_group2_test.

Indexed Scripts

edit

Elasticsearch allows you to store scripts in an internal index known as .scripts and reference them by id. There are REST endpoints to manage indexed scripts as follows:

Requests to the scripts endpoint look like :

/_scripts/{lang}/{id}

Where the lang part is the language the script is in and the id part is the id of the script. In the .scripts index the type of the document will be set to the lang.

curl -XPOST localhost:9200/_scripts/groovy/indexedCalculateScore -d '{
     "script": "log(_score * 2) + my_modifier"
}'

This will create a document with id: indexedCalculateScore and type: groovy in the .scripts index. The type of the document is the language used by the script.

This script can be accessed at query time by using the id script parameter and passing the script id:

curl -XPOST localhost:9200/_search -d '{
  "query": {
    "function_score": {
      "query": {
        "match": {
          "body": "foo"
        }
      },
      "functions": [
        {
          "script_score": {
            "script": {
              "id": "indexedCalculateScore",
              "lang" : "groovy",
              "params": {
                "my_modifier": 8
              }
            }
          }
        }
      ]
    }
  }
}'

The script can be viewed by:

curl -XGET localhost:9200/_scripts/groovy/indexedCalculateScore

This is rendered as:

'{
     "script": "log(_score * 2) + my_modifier"
}'

Indexed scripts can be deleted by:

curl -XDELETE localhost:9200/_scripts/groovy/indexedCalculateScore

Enabling dynamic scripting

edit

We recommend running Elasticsearch behind an application or proxy, which protects Elasticsearch from the outside world. If users are allowed to run inline scripts (even in a search request) or indexed scripts, then they have the same access to your box as the user that Elasticsearch is running as. For this reason dynamic scripting is allowed only for sandboxed languages by default.

First, you should not run Elasticsearch as the root user, as this would allow a script to access or do anything on your server, without limitations. Second, you should not expose Elasticsearch directly to users, but instead have a proxy application inbetween. If you do intend to expose Elasticsearch directly to your users, then you have to decide whether you trust them enough to run scripts on your box or not.

It is possible to enable scripts based on their source, for every script engine, through the following settings that need to be added to the config/elasticsearch.yml file on every node.

script.inline: true
script.indexed: true

While this still allows execution of named scripts provided in the config, or native Java scripts registered through plugins, it also allows users to run arbitrary scripts via the API. Instead of sending the name of the file as the script, the body of the script can be sent instead or retrieved from the .scripts indexed if previously stored.

There are three possible configuration values for any of the fine-grained script settings:

Value Description

false

scripting is turned off completely, in the context of the setting being set.

true

scripting is turned on, in the context of the setting being set.

sandbox

scripts may be executed only for languages that are sandboxed

The default values are the following:

script.inline: sandbox
script.indexed: sandbox
script.file: true

Global scripting settings affect the mustache scripting language. Search templates internally use the mustache language, and will still be enabled by default as the mustache engine is sandboxed, but they will be enabled/disabled according to fine-grained settings specified in elasticsearch.yml.

It is also possible to control which operations can execute scripts. The supported operations are:

Value Description

aggs

Aggregations (wherever they may be used)

search

Search api, Percolator api and Suggester api (e.g filters, script_fields)

update

Update api

plugin

Any plugin that makes use of scripts under the generic plugin category

Plugins can also define custom operations that they use scripts for instead of using the generic plugin category. Those operations can be referred to in the following form: ${pluginName}_${operation}.

The following example disables scripting for update and mapping operations, regardless of the script source, for any engine. Scripts can still be executed from sandboxed languages as part of aggregations, search and plugins execution though, as the above defaults still get applied.

script.update: false
script.mapping: false

Generic settings get applied in order, operation based ones have precedence over source based ones. Language specific settings are supported too. They need to be prefixed with the script.engine.<engine> prefix and have precedence over any other generic settings.

script.engine.groovy.file.aggs: true
script.engine.groovy.file.mapping: true
script.engine.groovy.file.search: true
script.engine.groovy.file.update: true
script.engine.groovy.file.plugin: true
script.engine.groovy.indexed.aggs: true
script.engine.groovy.indexed.mapping: false
script.engine.groovy.indexed.search: true
script.engine.groovy.indexed.update: false
script.engine.groovy.indexed.plugin: false
script.engine.groovy.inline.aggs: true
script.engine.groovy.inline.mapping: false
script.engine.groovy.inline.search: false
script.engine.groovy.inline.update: false
script.engine.groovy.inline.plugin: false

Default Scripting Language

edit

The default scripting language (assuming no lang parameter is provided) is groovy. In order to change it, set the script.default_lang to the appropriate language.

Automatic Script Reloading

edit

The config/scripts directory is scanned periodically for changes. New and changed scripts are reloaded and deleted script are removed from preloaded scripts cache. The reload frequency can be specified using resource.reload.interval setting, which defaults to 60s. To disable script reloading completely set script.auto_reload_enabled to false.

Native (Java) Scripts

edit

Sometimes groovy and expressions aren’t enough. For those times you can implement a native script.

The best way to implement a native script is to write a plugin and install it. The plugin documentation has more information on how to write a plugin so that Elasticsearch will properly load it.

To register the actual script you’ll need to implement NativeScriptFactory to construct the script. The actual script will extend either AbstractExecutableScript or AbstractSearchScript. The second one is likely the most useful and has several helpful subclasses you can extend like AbstractLongSearchScript, AbstractDoubleSearchScript, and AbstractFloatSearchScript. Finally, your plugin should register the native script by declaring the onModule(ScriptModule) method.

If you squashed the whole thing into one class it’d look like:

public class MyNativeScriptPlugin extends Plugin {
    @Override
    public String name() {
        return "my-native-script";
    }
    @Override
    public String description() {
        return "my native script that does something great";
    }
    public void onModule(ScriptModule scriptModule) {
        scriptModule.registerScript("my_script", MyNativeScriptFactory.class);
    }

    public static class MyNativeScriptFactory implements NativeScriptFactory {
        @Override
        public ExecutableScript newScript(@Nullable Map<String, Object> params) {
            return new MyNativeScript();
        }
        @Override
        public boolean needsScores() {
            return false;
        }
    }

    public static class MyNativeScript extends AbstractFloatSearchScript {
        @Override
        public float runAsFloat() {
            float a = (float) source().get("a");
            float b = (float) source().get("b");
            return a * b;
        }
    }
}

You can execute the script by specifying its lang as native, and the name of the script as the inline:

curl -XPOST localhost:9200/_search -d '{
  "query": {
    "function_score": {
      "query": {
        "match": {
          "body": "foo"
        }
      },
      "functions": [
        {
          "script_score": {
            "script": {
                "inline": "my_script",
                "lang" : "native"
            }
          }
        }
      ]
    }
  }
}'

Lucene Expressions Scripts

edit

The Lucene expressions module is undergoing significant development and the exposed functionality is likely to change in the future

Lucene’s expressions module provides a mechanism to compile a javascript expression to bytecode. This allows very fast execution, as if you had written a native script. Expression scripts can be used in script_score, script_fields, sort scripts and numeric aggregation scripts.

See the expressions module documentation for details on what operators and functions are available.

Variables in expression scripts are available to access:

  • Single valued document fields, e.g. doc['myfield'].value
  • Single valued document fields can also be accessed without .value e.g. doc['myfield']
  • Parameters passed into the script, e.g. mymodifier
  • The current document’s score, _score (only available when used in a script_score)

Variables in expression scripts that are of type date may use the following member methods:

  • getYear()
  • getMonth()
  • getDayOfMonth()
  • getHourOfDay()
  • getMinutes()
  • getSeconds()

The following example shows the difference in years between the date fields date0 and date1:

doc['date1'].getYear() - doc['date0'].getYear()

There are a few limitations relative to other script languages:

  • Only numeric fields may be accessed
  • Stored fields are not available
  • If a field is sparse (only some documents contain a value), documents missing the field will have a value of 0

Score

edit

In all scripts that can be used in aggregations, the current document’s score is accessible in _score.

Computing scores based on terms in scripts

edit

see advanced scripting documentation

Document Fields

edit

Most scripting revolve around the use of specific document fields data. The doc['field_name'] can be used to access specific field data within a document (the document in question is usually derived by the context the script is used). Document fields are very fast to access since they end up being loaded into memory (all the relevant field values/tokens are loaded to memory). Note, however, that the doc[...] notation only allows for simple valued fields (can’t return a json object from it) and makes sense only on non-analyzed or single term based fields.

The following data can be extracted from a field:

Expression Description

doc['field_name'].value

The native value of the field. For example, if its a short type, it will be short.

doc['field_name'].values

The native array values of the field. For example, if its a short type, it will be short[]. Remember, a field can have several values within a single doc. Returns an empty array if the field has no values.

doc['field_name'].empty

A boolean indicating if the field has no values within the doc.

doc['field_name'].multiValued

A boolean indicating that the field has several values within the corpus.

doc['field_name'].lat

The latitude of a geo point type.

doc['field_name'].lon

The longitude of a geo point type.

doc['field_name'].lats

The latitudes of a geo point type.

doc['field_name'].lons

The longitudes of a geo point type.

doc['field_name'].distance(lat, lon)

The plane distance (in meters) of this geo point field from the provided lat/lon.

doc['field_name'].distanceWithDefault(lat, lon, default)

The plane distance (in meters) of this geo point field from the provided lat/lon with a default value.

doc['field_name'].distanceInMiles(lat, lon)

The plane distance (in miles) of this geo point field from the provided lat/lon.

doc['field_name'].distanceInMilesWithDefault(lat, lon, default)

The plane distance (in miles) of this geo point field from the provided lat/lon with a default value.

doc['field_name'].distanceInKm(lat, lon)

The plane distance (in km) of this geo point field from the provided lat/lon.

doc['field_name'].distanceInKmWithDefault(lat, lon, default)

The plane distance (in km) of this geo point field from the provided lat/lon with a default value.

doc['field_name'].arcDistance(lat, lon)

The arc distance (in meters) of this geo point field from the provided lat/lon.

doc['field_name'].arcDistanceWithDefault(lat, lon, default)

The arc distance (in meters) of this geo point field from the provided lat/lon with a default value.

doc['field_name'].arcDistanceInMiles(lat, lon)

The arc distance (in miles) of this geo point field from the provided lat/lon.

doc['field_name'].arcDistanceInMilesWithDefault(lat, lon, default)

The arc distance (in miles) of this geo point field from the provided lat/lon with a default value.

doc['field_name'].arcDistanceInKm(lat, lon)

The arc distance (in km) of this geo point field from the provided lat/lon.

doc['field_name'].arcDistanceInKmWithDefault(lat, lon, default)

The arc distance (in km) of this geo point field from the provided lat/lon with a default value.

doc['field_name'].factorDistance(lat, lon)

The distance factor of this geo point field from the provided lat/lon.

doc['field_name'].factorDistance(lat, lon, default)

The distance factor of this geo point field from the provided lat/lon with a default value.

doc['field_name'].geohashDistance(geohash)

The arc distance (in meters) of this geo point field from the provided geohash.

doc['field_name'].geohashDistanceInKm(geohash)

The arc distance (in km) of this geo point field from the provided geohash.

doc['field_name'].geohashDistanceInMiles(geohash)

The arc distance (in miles) of this geo point field from the provided geohash.

Stored Fields

edit

Stored fields can also be accessed when executing a script. Note, they are much slower to access compared with document fields, as they are not loaded into memory. They can be simply accessed using _fields['my_field_name'].value or _fields['my_field_name'].values.

Accessing the score of a document within a script

edit

When using scripting for calculating the score of a document (for instance, with the function_score query), you can access the score using the _score variable inside of a Groovy script.

Source Field

edit

The source field can also be accessed when executing a script. The source field is loaded per doc, parsed, and then provided to the script for evaluation. The _source forms the context under which the source field can be accessed, for example _source.obj2.obj1.field3.

Accessing _source is much slower compared to using doc but the data is not loaded into memory. For a single field access _fields may be faster than using _source due to the extra overhead of potentially parsing large documents. However, _source may be faster if you access multiple fields or if the source has already been loaded for other purposes.

Groovy Built In Functions

edit

There are several built in functions that can be used within scripts. They include:

Function Description

sin(a)

Returns the trigonometric sine of an angle.

cos(a)

Returns the trigonometric cosine of an angle.

tan(a)

Returns the trigonometric tangent of an angle.

asin(a)

Returns the arc sine of a value.

acos(a)

Returns the arc cosine of a value.

atan(a)

Returns the arc tangent of a value.

toRadians(angdeg)

Converts an angle measured in degrees to an approximately equivalent angle measured in radians

toDegrees(angrad)

Converts an angle measured in radians to an approximately equivalent angle measured in degrees.

exp(a)

Returns Euler’s number e raised to the power of value.

log(a)

Returns the natural logarithm (base e) of a value.

log10(a)

Returns the base 10 logarithm of a value.

sqrt(a)

Returns the correctly rounded positive square root of a value.

cbrt(a)

Returns the cube root of a double value.

IEEEremainder(f1, f2)

Computes the remainder operation on two arguments as prescribed by the IEEE 754 standard.

ceil(a)

Returns the smallest (closest to negative infinity) value that is greater than or equal to the argument and is equal to a mathematical integer.

floor(a)

Returns the largest (closest to positive infinity) value that is less than or equal to the argument and is equal to a mathematical integer.

rint(a)

Returns the value that is closest in value to the argument and is equal to a mathematical integer.

atan2(y, x)

Returns the angle theta from the conversion of rectangular coordinates (x, y) to polar coordinates (r,theta).

pow(a, b)

Returns the value of the first argument raised to the power of the second argument.

round(a)

Returns the closest int to the argument.

random()

Returns a random double value.

abs(a)

Returns the absolute value of a value.

max(a, b)

Returns the greater of two values.

min(a, b)

Returns the smaller of two values.

ulp(d)

Returns the size of an ulp of the argument.

signum(d)

Returns the signum function of the argument.

sinh(x)

Returns the hyperbolic sine of a value.

cosh(x)

Returns the hyperbolic cosine of a value.

tanh(x)

Returns the hyperbolic tangent of a value.

hypot(x, y)

Returns sqrt(x2 + y2) without intermediate overflow or underflow.