Logstash Reference: other versions:
Logstash Introduction
Getting Started with Logstash
- Installing Logstash
- Stashing Your First Event
- Parsing Logs with Logstash
- Stitching Together Multiple Input and Output Plugins
How Logstash Works
- Execution Model
Setting Up and Running Logstash
- Logstash Directory Layout
- Logstash Configuration Files
- Settings File
- Running Logstash from the Command Line
- Running Logstash as a Service on Debian or RPM
- Running Logstash on Docker
- Logging
- Shutting Down Logstash
Setting Up X-Pack
- Installing X-Pack
- X-Pack Settings
Breaking changes
Upgrading Logstash
- Upgrading Using Package Managers
- Upgrading Using a Direct Download
- Upgrading Logstash to 5.0
- Upgrading with the Persistent Queue Enabled
Configuring Logstash
- Structure of a Config File
- Accessing Event Data and Fields in the Configuration
- Using Environment Variables in the Configuration
- Logstash Configuration Examples
- Reloading the Config File
- Managing Multiline Events
- Glob Pattern Support
Data Resiliency
- Persistent Queues
- Dead Letter Queues
Working with Filebeat Modules
- Configuration Examples
Transforming Data
- Performing Core Operations
- Deserializing Data
- Extracting Fields and Wrangling Data
- Enriching Data with Lookups
Deploying and Scaling Logstash
Performance Tuning
- Performance Troubleshooting Guide
- Tuning and Profiling Logstash Performance
Monitoring Logstash
- Monitoring UI
- Monitoring APIs
- Node Info API
- Plugins Info API
- Node Stats API
- Hot Threads API
Working with plugins
- Generating Plugins
- Offline Plugin Management
- Private Gem Repositories
- Event API
Input plugins
- Beats input plugin
- Cloudwatch input plugin
- Couchdb_changes input plugin
- Dead_letter_queue input plugin
- Drupal_dblog input plugin
- Elasticsearch input plugin
- Eventlog output plugin
- Exec input plugin
- File input plugin
- Ganglia input plugin
- Gelf input plugin
- Gemfire input plugin
- Generator input plugin
- Github input plugin
- Google_pubsub input plugin
- Graphite input plugin
- Heartbeat input plugin
- heroku input plugin
- Http input plugin
- Http_poller input plugin
- Imap input plugin
- Irc input plugin
- Jdbc input plugin
- Jms input plugin
- Jmx input plugin
- Kafka input plugin
- Kinesis input plugin
- Log4j input plugin
- Lumberjack input plugin
- Meetup input plugin
- Pipe input plugin
- Puppet_facter input plugin
- Rabbitmq input plugin
- rackspace input plugin
- Redis input plugin
- Relp input plugin
- Rss input plugin
- S3 input plugin
- Salesforce input plugin
- Snmptrap input plugin
- Sqlite input plugin
- Sqs input plugin
- Stdin input plugin
- Stomp input plugin
- Syslog input plugin
- Tcp input plugin
- Twitter input plugin
- Udp input plugin
- Unix input plugin
- Varnishlog input plugin
- Websocket input plugin
- Wmi input plugin
- Xmpp input plugin
- Zenoss input plugin
- Zeromq input plugin
Output plugins
- Boundary output plugin
- Circonus output plugin
- Cloudwatch output plugin
- Csv output plugin
- Datadog output plugin
- Datadog_metrics output plugin
- Elasticsearch output plugin
- Email output plugin
- Exec output plugin
- File output plugin
- Ganglia output plugin
- Gelf output plugin
- Google_bigquery output plugin
- Google_cloud_storage output plugin
- Graphite output plugin
- Graphtastic output plugin
- Hipchat output plugin
- Http output plugin
- Influxdb output plugin
- Irc output plugin
- Jira output plugin
- Jms output plugin
- Juggernaut output plugin
- Kafka output plugin
- Librato output plugin
- Loggly output plugin
- Lumberjack output plugin
- Metriccatcher output plugin
- Mongodb output plugin
- Nagios output plugin
- Nagios_nsca output plugin
- Newrelic output plugin
- Opentsdb output plugin
- Pagerduty output plugin
- Pipe output plugin
- Rabbitmq output plugin
- Rackspace output plugin
- Redis output plugin
- Redmine output plugin
- Riak output plugin
- Riemann output plugin
- S3 output plugin
- Sns output plugin
- Solr_http output plugin
- Sqs output plugin
- Statsd output plugin
- Stdout output plugin
- Stomp output plugin
- Syslog output plugin
- Tcp output plugin
- Udp output plugin
- Webhdfs output plugin
- Websocket output plugin
- Xmpp output plugin
- Zabbix output plugin
- Zeromq output plugin
Filter plugins
- Aggregate filter plugin
- Alter filter plugin
- Anonymize filter plugin
- Cidr filter plugin
- Cipher filter plugin
- Clone filter plugin
- Collate filter plugin
- Csv filter plugin
- Date filter plugin
- De_dot filter plugin
- Dissect filter plugin
- Dns filter plugin
- Drop filter plugin
- Elapsed filter plugin
- Elasticsearch filter plugin
- Environment filter plugin
- Extractnumbers filter plugin
- Fingerprint filter plugin
- Geoip filter plugin
- Grok filter plugin
- I18n filter plugin
- Jdbc_streaming filter plugin
- Json filter plugin
- Json_encode filter plugin
- Kv filter plugin
- Metaevent filter plugin
- Metricize filter plugin
- Metrics filter plugin
- Mutate filter plugin
- Oui filter plugin
- Prune filter plugin
- Punct filter plugin
- Range filter plugin
- Ruby filter plugin
- Sleep filter plugin
- Split filter plugin
- Syslog_pri filter plugin
- Throttle filter plugin
- Tld filter plugin
- Translate filter plugin
- Truncate filter plugin
- Urldecode filter plugin
- Useragent filter plugin
- Uuid filter plugin
- Xml filter plugin
- Yaml filter plugin
- Zeromq filter plugin
Codec plugins
- Avro codec plugin
- Cef codec plugin
- Cloudfront codec plugin
- Cloudtrail codec plugin
- Collectd codec plugin
- Compress_spooler codec plugin
- Dots codec plugin
- Edn codec plugin
- Edn_lines codec plugin
- Es_bulk codec plugin
- Fluent codec plugin
- Graphite codec plugin
- Gzip_lines codec plugin
- Json codec plugin
- Json_lines codec plugin
- Line codec plugin
- Msgpack codec plugin
- Multiline codec plugin
- Netflow codec plugin
- Nmap codec plugin
- Oldlogstashjson codec plugin
- Plain codec plugin
- Protobuf codec plugin
- Rubydebug codec plugin
Contributing to Logstash
- How to write a Logstash input plugin
- How to write a Logstash input plugin
- How to write a Logstash codec plugin
- How to write a Logstash filter plugin
- Contributing a Patch to a Logstash Plugin
- Logstash Plugins Community Maintainer Guide
- Submitting your plugin to RubyGems.org and the logstash-plugins repository
Glossary of Terms
Release Notes
- Logstash 5.5.3 Release Notes
- Logstash 5.5.2 Release Notes
- Logstash 5.5.1 Release Notes
- Logstash 5.5.0 Release Notes

IMPORTANT: No additional bug fixes or documentation updates will be released for this version. For the latest information, see the current release documentation.

« Geoip filter plugin I18n filter plugin »

› ›

Grok filter plugin

edit

Grok filter plugin

edit

Plugin version: v3.4.2
Released on: 2017-06-23
Changelog

Getting Help

edit

For questions about the plugin, open a topic in the Discuss forums. For bugs or feature requests, open an issue in Github. For the list of Elastic supported plugins, please consult the Elastic Support Matrix.

Description

edit

Parse arbitrary text and structure it.

Grok is currently the best way in logstash to parse crappy unstructured log data into something structured and queryable.

This tool is perfect for syslog logs, apache and other webserver logs, mysql logs, and in general, any log format that is generally written for humans and not computer consumption.

Logstash ships with about 120 patterns by default. You can find them here: https://github.com/logstash-plugins/logstash-patterns-core/tree/master/patterns. You can add your own trivially. (See the patterns_dir setting)

If you need help building patterns to match your logs, you will find the http://grokdebug.herokuapp.com and http://grokconstructor.appspot.com/ applications quite useful!

Grok Basics

edit

Grok works by combining text patterns into something that matches your logs.

The syntax for a grok pattern is %{SYNTAX:SEMANTIC}

The SYNTAX is the name of the pattern that will match your text. For example, 3.44 will be matched by the NUMBER pattern and 55.3.244.1 will be matched by the IP pattern. The syntax is how you match.

The SEMANTIC is the identifier you give to the piece of text being matched. For example, 3.44 could be the duration of an event, so you could call it simply duration. Further, a string 55.3.244.1 might identify the client making a request.

For the above example, your grok filter would look something like this:

%{NUMBER:duration} %{IP:client}

Optionally you can add a data type conversion to your grok pattern. By default all semantics are saved as strings. If you wish to convert a semantic’s data type, for example change a string to an integer then suffix it with the target data type. For example %{NUMBER:num:int} which converts the num semantic from a string to an integer. Currently the only supported conversions are int and float.

Examples:With that idea of a syntax and semantic, we can pull out useful fields from a sample log like this fictional http request log:

    55.3.244.1 GET /index.html 15824 0.043

The pattern for this could be:

    %{IP:client} %{WORD:method} %{URIPATHPARAM:request} %{NUMBER:bytes} %{NUMBER:duration}

A more realistic example, let’s read these logs from a file:

    input {
      file {
        path => "/var/log/http.log"
      }
    }
    filter {
      grok {
        match => { "message" => "%{IP:client} %{WORD:method} %{URIPATHPARAM:request} %{NUMBER:bytes} %{NUMBER:duration}" }
      }
    }

After the grok filter, the event will have a few extra fields in it:

client: 55.3.244.1
method: GET
request: /index.html
bytes: 15824
duration: 0.043

Regular Expressions

edit

Grok sits on top of regular expressions, so any regular expressions are valid in grok as well. The regular expression library is Oniguruma, and you can see the full supported regexp syntax on the Oniguruma site.

Custom Patterns

edit

Sometimes logstash doesn’t have a pattern you need. For this, you have a few options.

First, you can use the Oniguruma syntax for named capture which will let you match a piece of text and save it as a field:

    (?<field_name>the pattern here)

For example, postfix logs have a queue id that is an 10 or 11-character hexadecimal value. I can capture that easily like this:

    (?<queue_id>[0-9A-F]{10,11})

Alternately, you can create a custom patterns file.

Create a directory called patterns with a file in it called extra (the file name doesn’t matter, but name it meaningfully for yourself)
In that file, write the pattern you need as the pattern name, a space, then the regexp for that pattern.

For example, doing the postfix queue id example as above:

    # contents of ./patterns/postfix:
    POSTFIX_QUEUEID [0-9A-F]{10,11}

Then use the patterns_dir setting in this plugin to tell logstash where your custom patterns directory is. Here’s a full example with a sample log:

    Jan  1 06:25:43 mailserver14 postfix/cleanup[21403]: BEF25A72965: message-id=<20130101142543.5828399CCAF@mailserver14.example.com>
[source,ruby]
    filter {
      grok {
        patterns_dir => ["./patterns"]
        match => { "message" => "%{SYSLOGBASE} %{POSTFIX_QUEUEID:queue_id}: %{GREEDYDATA:syslog_message}" }
      }
    }

The above will match and result in the following fields:

timestamp: Jan 1 06:25:43
logsource: mailserver14
program: postfix/cleanup
pid: 21403
queue_id: BEF25A72965
syslog_message: message-id=<20130101142543.5828399CCAF@mailserver14.example.com>

The timestamp, logsource, program, and pid fields come from the SYSLOGBASE pattern which itself is defined by other patterns.

Another option is to define patterns inline in the filter using pattern_definitions. This is mostly for convenience and allows user to define a pattern which can be used just in that filter. This newly defined patterns in pattern_definitions will not be available outside of that particular grok filter.

Grok Filter Configuration Options

edit

This plugin supports the following configuration options plus the Common Options described later.

Setting	Input type	Required
`break_on_match`	boolean	No
`keep_empty_captures`	boolean	No
`match`	hash	No
`named_captures_only`	boolean	No
`overwrite`	array	No
`pattern_definitions`	hash	No
`patterns_dir`	array	No
`patterns_files_glob`	string	No
`tag_on_failure`	array	No
`tag_on_timeout`	string	No
`timeout_millis`	number	No

Also see Common Options for a list of options supported by all filter plugins.

`break_on_match`

edit

Value type is boolean
Default value is true

Break on first match. The first successful match by grok will result in the filter being finished. If you want grok to try all patterns (maybe you are parsing different things), then set this to false.

`keep_empty_captures`

edit

Value type is boolean
Default value is false

If true, keep empty captures as event fields.

`match`

edit

Value type is hash
Default value is {}

A hash of matches of field ⇒ value

For example:

    filter {
      grok { match => { "message" => "Duration: %{NUMBER:duration}" } }
    }

If you need to match multiple patterns against a single field, the value can be an array of patterns

    filter {
      grok { match => { "message" => [ "Duration: %{NUMBER:duration}", "Speed: %{NUMBER:speed}" ] } }
    }

`named_captures_only`

edit

Value type is boolean
Default value is true

If true, only store named captures from grok.

`overwrite`

edit

Value type is array
Default value is []

The fields to overwrite.

This allows you to overwrite a value in a field that already exists.

For example, if you have a syslog line in the message field, you can overwrite the message field with part of the match like so:

    filter {
      grok {
        match => { "message" => "%{SYSLOGBASE} %{DATA:message}" }
        overwrite => [ "message" ]
      }
    }

In this case, a line like May 29 16:37:11 sadness logger: hello world will be parsed and hello world will overwrite the original message.

`pattern_definitions`

edit

Value type is hash
Default value is {}

A hash of pattern-name and pattern tuples defining custom patterns to be used by the current filter. Patterns matching existing names will override the pre-existing definition. Think of this as inline patterns available just for this definition of grok

`patterns_dir`

edit

Value type is array
Default value is []

Logstash ships by default with a bunch of patterns, so you don’t necessarily need to define this yourself unless you are adding additional patterns. You can point to multiple pattern directories using this setting. Note that Grok will read all files in the directory matching the patterns_files_glob and assume it’s a pattern file (including any tilde backup files).

    patterns_dir => ["/opt/logstash/patterns", "/opt/logstash/extra_patterns"]

Pattern files are plain text with format:

    NAME PATTERN

For example:

    NUMBER \d+

The patterns are loaded when the pipeline is created.

`patterns_files_glob`

edit

Value type is string
Default value is "*"

Glob pattern, used to select the pattern files in the directories specified by patterns_dir

`tag_on_failure`

edit

Value type is array
Default value is ["_grokparsefailure"]

Append values to the tags field when there has been no successful match

`tag_on_timeout`

edit

Value type is string
Default value is "_groktimeout"

Tag to apply if a grok regexp times out.

`timeout_millis`

edit

Value type is number
Default value is 30000

Attempt to terminate regexps after this amount of time. This applies per pattern if multiple patterns are applied This will never timeout early, but may take a little longer to timeout. Actual timeout is approximate based on a 250ms quantization. Set to 0 to disable timeouts

Common Options

edit

The following configuration options are supported by all filter plugins:

Setting	Input type	Required
`add_field`	hash	No
`add_tag`	array	No
`enable_metric`	boolean	No
`id`	string	No
`periodic_flush`	boolean	No
`remove_field`	array	No
`remove_tag`	array	No

`add_field`

edit

Value type is hash
Default value is {}

If this filter is successful, add any arbitrary fields to this event. Field names can be dynamic and include parts of the event using the %{field}.

Example:

filter {
  PLUGIN_NAME {
    add_field => { "foo_%{somefield}" => "Hello world, from %{host}" }
  }
}

# You can also add multiple fields at once:
filter {
  PLUGIN_NAME {
    add_field => {
      "foo_%{somefield}" => "Hello world, from %{host}"
      "new_field" => "new_static_value"
    }
  }
}

If the event has field "somefield" == "hello" this filter, on success, would add field foo_hello if it is present, with the value above and the %{host} piece replaced with that value from the event. The second example would also add a hardcoded field.

`add_tag`

edit

Value type is array
Default value is []

If this filter is successful, add arbitrary tags to the event. Tags can be dynamic and include parts of the event using the %{field} syntax.

Example:

filter {
  PLUGIN_NAME {
    add_tag => [ "foo_%{somefield}" ]
  }
}

# You can also add multiple tags at once:
filter {
  PLUGIN_NAME {
    add_tag => [ "foo_%{somefield}", "taggedy_tag"]
  }
}

If the event has field "somefield" == "hello" this filter, on success, would add a tag foo_hello (and the second example would of course add a taggedy_tag tag).

`enable_metric`

edit

Value type is boolean
Default value is true

Disable or enable metric logging for this specific plugin instance by default we record all the metrics we can, but you can disable metrics collection for a specific plugin.

`id`

edit

Value type is string
There is no default value for this setting.

Add a unique ID to the plugin instance, this ID is used for tracking information for a specific configuration of the plugin.

output {
 stdout {
   id => "ABC"
 }
}

If you don’t explicitely set this variable Logstash will generate a unique name.

`periodic_flush`

edit

Value type is boolean
Default value is false

Call the filter flush method at regular interval. Optional.

`remove_field`

edit

Value type is array
Default value is []

If this filter is successful, remove arbitrary fields from this event. Fields names can be dynamic and include parts of the event using the %{field} Example:

filter {
  PLUGIN_NAME {
    remove_field => [ "foo_%{somefield}" ]
  }
}

# You can also remove multiple fields at once:
filter {
  PLUGIN_NAME {
    remove_field => [ "foo_%{somefield}", "my_extraneous_field" ]
  }
}

If the event has field "somefield" == "hello" this filter, on success, would remove the field with name foo_hello if it is present. The second example would remove an additional, non-dynamic field.

`remove_tag`

edit

Value type is array
Default value is []

If this filter is successful, remove arbitrary tags from the event. Tags can be dynamic and include parts of the event using the %{field} syntax.

Example:

filter {
  PLUGIN_NAME {
    remove_tag => [ "foo_%{somefield}" ]
  }
}

# You can also remove multiple tags at once:
filter {
  PLUGIN_NAME {
    remove_tag => [ "foo_%{somefield}", "sad_unwanted_tag"]
  }
}

If the event has field "somefield" == "hello" this filter, on success, would remove the tag foo_hello if it is present. The second example would remove a sad, unwanted tag as well.

« Geoip filter plugin I18n filter plugin »

Was this helpful?

Feedback

The Search AI Company

Generative AI

Search

Security

Observability

By solution

Industries

Grok filter plugin

Grok filter plugin

Getting Help

Description

Grok Basics

Regular Expressions

Custom Patterns

Grok Filter Configuration Options

break_on_match

keep_empty_captures

match

named_captures_only

overwrite

pattern_definitions

patterns_dir

patterns_files_glob

tag_on_failure

tag_on_timeout

timeout_millis

Common Options

add_field

add_tag

enable_metric

id

periodic_flush

remove_field

remove_tag

Follow us

About us

Join us

Partners

Trust & Security

Investor relations

Excellence Awards

`break_on_match`

`keep_empty_captures`

`match`

`named_captures_only`

`overwrite`

`pattern_definitions`

`patterns_dir`

`patterns_files_glob`

`tag_on_failure`

`tag_on_timeout`

`timeout_millis`

`add_field`

`add_tag`

`enable_metric`

`id`

`periodic_flush`

`remove_field`

`remove_tag`