Developer guide: creating a new module

edit

Developer guide: creating a new module

edit

This guide will walk you through creating a new Filebeat module.

All Filebeat modules currently live in the main Beats repository. To clone the repository and build Filebeat (which you will need for testing), please follow the general Beats CONTRIBUTING guide.

Overview

edit

Each Filebeat module is composed of one or more "filesets". We usually create a module for each service that we support (nginx for Nginx, mysql for Mysql, and so on) and a fileset for each type of log that the service creates. For example, the Nginx module has access and error filesets. You can contribute a new module (with at least one fileset), or a new fileset for an existing module.

Creating a new fileset

edit

Regardless of whether you are creating a fileset in a new or existing module, the procedure is similar. Run the following command in the filebeat folder:

make create-fileset

You’ll be prompted to enter a module and fileset name. Only use characters [a-z] and, if required, underscores (_). No other characters are allowed. For the module name, you can us either a new module name or an existing module name. If the module doesn’t exist, it will be created.

In this guide we use {fileset} and {module} as placeholders for the fileset and module names. You need to replace these with the actual names you entered when your created the module and fileset.

After running the make create-fileset command, you’ll find the fileset, along with its generated files, under module/{module}/{fileset}. This directory contains the following files:

module/{module}/{fileset}
├── manifest.yml
├── config
│   └── {fileset}.yml
├── ingest
│   └── pipeline.json
├── _meta
│   └── fields.yml
└── test

Let’s look at these files one by one.

manifest.yml

edit

The manifest.yml is the control file for the module, where variables are defined and the other files are referenced. It is a YAML file, but in many places in the file, you can use built-in or defined variables by using the {{.variable}} syntax.

The var section of the file defines the fileset variables and their default values. The module variables can be referenced in other configuration files, and their value can be overridden at runtime by the Filebeat configuration.

As the fileset creator, you can use any names for the variables you define. Each variable must have a default value. So in it’s simplest form, this is how you can define a new variable:

var:
  - name: pipeline
    default: with_plugins

Most fileset should have a paths variable defined, which sets the default paths where the log files are located:

var:
  - name: paths
    default:
      - /example/test.log*
    os.darwin:
      - /usr/local/example/test.log*
      - /example/test.log*
    os.windows:
      - c:/programdata/example/logs/test.log*

There’s quite a lot going on in this file, so let’s break it down:

  • The name of the variable is paths and the default value is an array with one element: "/example/test.log*".
  • Note that variable values don’t have to be strings. They can be also numbers, objects, or as shown in this example, arrays.
  • We will use the paths variable to set the prospector paths setting, so "glob" values can be used here.
  • Besides the default value, the file defines values for particular operating systems: a default for darwin/OS X/macOS systems and a default for Windows systems. These are introduced via the os.darwin and os.windows keywords. The values under these keys become the default for the variable, if Filebeat is executed on the respective OS.

Besides the variable definition, the manifest.yml file also contains references to the ingest pipeline and prospector configuration to use (see next sections):

ingest_pipeline: ingest/pipeline.json
prospector: config/testfileset.yml

These should point to the respective files from the fileset.

Note that when evaluating the contents of these files, the variables are expanded, which enables you to select one file or the other depending on the value of a variable. For example:

ingest_pipeline: ingest/{{.pipeline}}.json

This example selects the ingest pipeline file based on the value of the pipeline variable. For the pipeline variable shown earlier, the path would resolve to ingest/with_plugins.json (assuming the variable value isn’t overridden at runtime.)

config/*.yml

edit

The config/ folder contains template files that generate Filebeat prospector configurations. The Filebeat prospectors are primarily responsible for tailing files, filtering, and multi-line stitching, so that’s what you configure in the template files.

A typical example looks like this:

input_type: log
paths:
{{ range $i, $path := .paths }}
 - {{$path}}
{{ end }}
exclude_files: [".gz$"]

You’ll find this example in the template file that gets generated automatically when you run make create-fileset. In this example, the paths variable is used to construct the paths list for the paths option.

Any template files that you add to the config/ folder need to generate a valid Filebeat prospector configuration in YAML format. The options accepted by the prospector configuration are documented in this section.

The template files use the templating language defined by the Golang standard library.

Here is another example that also configures multiline stitching:

input_type: log
paths:
{{ range $i, $path := .paths }}
 - {{$path}}
{{ end }}
exclude_files: [".gz$"]
multiline:
  pattern: "^# User@Host: "
  negate: true
  match: after

Although you can add multiple configuration files under the config/ folder, only the file indicated by the manifest.yml file will be loaded. You can use variables to dynamically switch between configurations.

ingest/*.json

edit

The ingest/ folder contains Elasticsearch Ingest Node pipeline configurations. The Ingest Node pipelines are responsible for parsing the log lines and doing other manipulations on the data.

The files in this folder are JSON documents representing pipeline definitions. Just like with the config/ folder, you can define multiple pipelines, but a single one is loaded at runtime based on the information from manifest.yml.

The generator creates a JSON object similar to this one:

{
  "description": "Pipeline for parsing {module} {fileset} logs",
  "processors": [
    ],
  "on_failure" : [{
    "set" : {
      "field" : "error",
      "value" : "{{ _ingest.on_failure_message }}"
    }
  }]
}

From here, you would typically add processors to the processors array to do the actual parsing. For details on how to use ingest node processors, see the ingest node documentation. In particular, you will likely find the Grok processor to be useful for parsing. Here is an example for parsing the Nginx access logs.

{
  "grok": {
    "field": "message",
    "patterns":[
      "%{IPORHOST:nginx.access.remote_ip} - %{DATA:nginx.access.user_name} \\[%{HTTPDATE:nginx.access.time}\\] \"%{WORD:nginx.access.method} %{DATA:nginx.access.url} HTTP/%{NUMBER:nginx.access.http_version}\" %{NUMBER:nginx.access.response_code} %{NUMBER:nginx.access.body_sent.bytes} \"%{DATA:nginx.access.referrer}\" \"%{DATA:nginx.access.agent}\""
      ],
    "ignore_missing": true
  }
}

Note that you should follow the convention of naming of fields prefixed with the module and fileset name: {module}.{fileset}.field, e.g. nginx.access.remote_ip. Also, please review our field naming conventions.

While developing the pipeline definition, we recommend making use of the Simulate Pipeline API for testing and quick iteration.

_meta/fields.yml

edit

The fields.yml file contains the top-level structure for the fields in your fileset. It is used as the source of truth for:

  • the generated Elasticsearch mapping template
  • the generated Kibana index pattern
  • the generated documentation for the exported fields

Besides the fields.yml file in the fileset, there is also a fields.yml file at the module level, placed under module/{module}/_meta/fields.yml, which should contain the fields defined at the module level, and the description of the module itself. In most cases, you should add the fields at the fileset level.

test

edit

In the test/ directory, you should place sample log files generated by the service. We have integration tests, automatically executed by CI, that will run Filebeat on each of the log files under the test/ folder and check that there are no parsing errors and that all fields are documented.

In addition, assuming you have a test.log file, you can add a test.log-expected.json file in the same directory that contains the expected documents as they are found via an Elasticsearch search. In this case, the integration tests will automatically check that the result is the same on each run.

Module-level files

edit

Besides the files in the fileset folder, there is also data that needs to be filled at the module level.

_meta/docs.asciidoc

edit

This file contains module-specific documentation. You should include information about which versions of the service were tested and the variables that are defined in each fileset.

_meta/fields.yml

edit

The module level fields.yml contains descriptions for the module-level fields. Please review and update the title and the descriptions in this file. The title is used as a title in the docs, so it’s best to capitalize it.

_meta/kibana

edit

This folder contains the sample Kibana dashboards for this module. To create them, you can build them visually in Kibana and then run the following command:

$ cd filebeat/module/{module}/
python ../../../dev-tools/export_dashboards.py --regex {module} --dir _meta/kibana

Where the --regex parameter should match the dashboard you want to export.

You can find more details about the process of creating and exporting the Kibana dashboards by reading this guide.