Tips and best practices

edit

Tips and best practices

edit

We are adding more tips and best practices, so please check back soon. If you have something to add, please:

Also check out the Logstash discussion forum.

Command line

edit

Shell commands on Windows OS

edit

Command line examples often show single quotes. On Windows systems, replace a single quote ' with a double quote ".

Example

Instead of:

bin/logstash -e 'input { stdin { } } output { stdout {} }'

Use this format on Windows systems:

bin\logstash -e "input { stdin { } } output { stdout {} }"

Pipelines

edit

Pipeline management

edit

You can manage pipelines in a Logstash instance using either local pipeline configurations or centralized pipeline management in Kibana.

After you configure Logstash to use centralized pipeline management, you can no longer specify local pipeline configurations. The pipelines.yml file and settings such as path.config and config.string are inactive when centralized pipeline management is enabled.

Tips using filters

edit

Check to see if a boolean field exists

edit

You can use the mutate filter to see if a boolean field exists.

Logstash supports [@metadata] fields—​fields that are not visible for output plugins and live only in the filtering state. You can use [@metadata] fields with the mutate filter to see if a field exists.

filter {
  mutate {
    # we use a "temporal" field with a predefined arbitrary known value that
    # lives only in filtering stage.
    add_field => { "[@metadata][test_field_check]" => "a null value" }

    # we copy the field of interest into that temporal field.
    # If the field doesn't exist, copy is not executed.
    copy => { "test_field" => "[@metadata][test_field_check]" }
  }


  # now we now if testField didn't exists, our field will have
  # the initial arbitrary value
  if [@metadata][test_field_check] == "a null value" {
    # logic to execute when [test_field] did not exist
    mutate { add_field => { "field_did_not_exist" => true }}
  } else {
    # logic to execute when [test_field] existed
    mutate { add_field => { "field_did_exist" => true }}
  }
}

Kafka

edit

Kafka settings

edit
Partitions per topic
edit

"How many partitions should I use per topic?"

At least the number of Logstash nodes multiplied by consumer threads per node.

Better yet, use a multiple of the above number. Increasing the number of partitions for an existing topic is extremely complicated. Partitions have a very low overhead. Using 5 to 10 times the number of partitions suggested by the first point is generally fine, so long as the overall partition count does not exceed 2000.

Err on the side of over-partitioning up to a total 1000 partitions overall. Try not to exceed 1000 partitions.

Consumer threads
edit

"How many consumer threads should I configure?"

Lower values tend to be more efficient and have less memory overhead. Try a value of 1 then iterate your way up. The value should in general be lower than the number of pipeline workers. Values larger than 4 rarely result in performance improvement.

Kafka input and persistent queue (PQ)

edit
Kafka offset commits
edit

"Does Kafka Input commit offsets only after the event has been safely persisted to the PQ?"

"Does Kafa Input commit offsets only for events that have passed the pipeline fully?"

No, we can’t make that guarantee. Offsets are committed to Kafka periodically. If writes to the PQ are slow or blocked, offsets for events that haven’t safely reached the PQ can be committed.