Create an Elasticsearch query rule

edit

Required role

The Editor role or higher is required to create Elasticsearch query rules. To learn more, refer to Assign user roles and privileges.

The Elasticsearch query rule type runs a user-configured query, compares the number of matches to a configured threshold, and schedules actions to run when the threshold condition is met.

  1. To access this page, from your project go to Alerts.
  2. Click Manage RulesCreate rule.
  3. Under Select rule type, select Elasticsearch query.

An Elasticsearch query rule can be defined using Elasticsearch Query Domain Specific Language (DSL), Elasticsearch Query Language (ES|QL), Kibana Query Language (KQL), or Lucene.

Define the conditions
edit

When you create an Elasticsearch query rule, your choice of query type affects the information you must provide. For example:

Define the condition to detect
  1. Define your query

    If you use query DSL, you must select an index and time field then provide your query. Only the query, fields, _source and runtime_mappings fields are used, other DSL fields are not considered. For example:

    {
      "query":{
        "match_all" : {}
      }
    }

    If you use KQL or Lucene, you must specify a data view then define a text-based query. For example, http.request.referrer: "https://example.com".

    If you use ES|QL, you must provide a source command followed by an optional series of processing commands, separated by pipe characters (|). For example:

    FROM kibana_sample_data_logs
    | STATS total_bytes = SUM(bytes) BY host
    | WHERE total_bytes > 200000
    | SORT total_bytes DESC
    | LIMIT 10
  2. If you use query DSL, KQL, or Lucene, set the group and theshold.

    When
    Specify how to calculate the value that is compared to the threshold. The value is calculated by aggregating a numeric field within the time window. The aggregation options are: count, average, sum, min, and max. When using count the document count is used and an aggregation field is not necessary.
    Over or Grouped Over
    Specify whether the aggregation is applied over all documents or split into groups using up to four grouping fields. If you choose to use grouping, it’s a terms or multi terms aggregation; an alert will be created for each unique set of values when it meets the condition. To limit the number of alerts on high cardinality fields, you must specify the number of groups to check against the threshold. Only the top groups are checked.
    Threshold
    Defines a threshold value and a comparison operator (is above, is above or equals, is below, is below or equals, or is between). The value calculated by the aggregation is compared to this threshold.
  3. Set the time window, which defines how far back to search for documents.
  4. If you use query DSL, KQL, or Lucene, set the number of documents to send to the configured actions when the threshold condition is met.
  5. If you use query DSL, KQL, or Lucene, choose whether to avoid alert duplication by excluding matches from the previous run. This option is not available when you use a grouping field.
  6. Set the check interval, which defines how often to evaluate the rule conditions. Generally this value should be set to a value that is smaller than the time window, to avoid gaps in detection.
Test your query
edit

Use the Test query feature to verify that your query is valid.

If you use query DSL, KQL, or Lucene, the query runs against the selected indices using the configured time window. The number of documents that match the query is displayed. For example:

Test Elasticsearch query returns number of matches when valid

If you use an ES|QL query, a table is displayed. For example:

Test ES|QL query returns a table when valid

If the query is not valid, an error occurs.

Add actions
edit

You can optionally send notifications when the rule conditions are met and when they are no longer met. In particular, this rule type supports:

  • alert summaries
  • actions that run when the query is matched
  • recovery actions that run when the rule conditions are no longer met

For each action, you must choose a connector, which provides connection information for a service or third party integration.

Connector types

Connectors provide a central place to store connection information for services and integrations with third party systems. The following connectors are available when defining actions for alerting rules:

Some connector types are paid commercial features, while others are free. For a comparison of the Elastic subscription levels, go to the subscription page.

For more information on creating connectors, refer to Connectors.

Action frequency

After you select a connector, you must set the action frequency. You can choose to create a Summary of alerts on each check interval or on a custom interval. For example, you can send email notifications that summarize the new, ongoing, and recovered alerts at a custom interval:

UI for defining alert summary action in an Elasticsearch query rule

Alternatively, you can set the action frequency to For each alert and specify the conditions each alert must meet for the action to run.

With the Run when menu you can choose how often the action runs (at each check interval, only when the alert status changes, or at a custom action interval). You must also choose an action group, which indicates whether the action runs when the query is matched or when the alert is recovered. Each connector supports a specific set of actions for each action group. For example:

UI for defining a recovery action

You can further refine the conditions under which actions run by specifying that actions only run when they match a KQL query or when an alert occurs within a specific time frame.

Action variables

Use the default notification message or customize it. You can add more context to the message by clicking the Add variable icon Add variable and selecting from a list of available variables.

Action variables list

The following variables are specific to this rule type. You can also specify variables common to all rules.

context.conditions
A string that describes the threshold condition. Example: count greater than 4.
context.date
The date, in ISO format, that the rule met the condition. Example: 2022-02-03T20:29:27.732Z.
context.hits

The most recent documents that matched the query. Using the Mustache template array syntax, you can iterate over these hits to get values from the Elasticsearch documents into your actions.

For example, the message in an email connector action might contain:

Elasticsearch query rule '{{rule.name}}' is active:

{{#context.hits}}
Document with {{_id}} and hostname {{_source.host.name}} has
{{_source.system.memory.actual.free}} bytes of memory free
{{/context.hits}}

The documents returned by context.hits include the _source field. If the Elasticsearch query search API’s fields parameter is used, documents will also return the fields field, which can be used to access any runtime fields defined by the runtime_mappings parameter. For example:

{{#context.hits}}
timestamp: {{_source.@timestamp}}
day of the week: {{fields.day_of_week}} 
{{/context.hits}}

The fields parameter here is used to access the day_of_week runtime field.

As the fields response always returns an array of values for each field, the Mustache template array syntax is used to iterate over these values in your actions. For example:

{{#context.hits}}
Labels:
{{#fields.labels}}
- {{.}}
{{/fields.labels}}
{{/context.hits}}
context.link
Link to Discover and show the records that triggered the alert.
context.message
A message for the alert. Example: rule 'my es-query' is active: - Value: 2 - Conditions Met: Number of matching documents is greater than 1 over 5m - Timestamp: 2022-02-03T20:29:27.732Z
context.title
A title for the alert. Example: rule term match alert query matched.
context.value
The value that met the threshold condition.
Handling multiple matches of the same document
edit

By default, Exclude matches from previous run is turned on and the rule checks for duplication of document matches across multiple runs. If you configure the rule with a schedule interval smaller than the time window and a document matches a query in multiple runs, it is alerted on only once.

The rule uses the timestamp of the matches to avoid alerting on the same match multiple times. The timestamp of the latest match is used for evaluating the rule conditions when the rule runs. Only matches between the latest timestamp from the previous run and the current run are considered.

Suppose you have a rule configured to run every minute. The rule uses a time window of 1 hour and checks if there are more than 99 matches for the query. The Elasticsearch query rule type does the following:

Run 1 (0:00)

Rule finds 113 matches in the last hour: 113 > 99

Rule is active and user is alerted.

Run 2 (0:01)

Rule finds 127 matches in the last hour. 105 of the matches are duplicates that were already alerted on previously, so you actually have 22 matches: 22 !> 99

No alert.

Run 3 (0:02)

Rule finds 159 matches in the last hour. 88 of the matches are duplicates that were already alerted on previously, so you actually have 71 matches: 71 !> 99

No alert.

Run 4 (0:03)

Rule finds 190 matches in the last hour. 71 of them are duplicates that were already alerted on previously, so you actually have 119 matches: 119 > 99

Rule is active and user is alerted.