JVM settings

edit

Configure JVM settings in the jvm.options settings file. JVM settings can also be set via the LS_JAVA_OPTS environment variable.

This file contains a line-delimited list of JVM arguments following a special syntax:

  • lines consisting of whitespace only are ignored
  • lines beginning with # are treated as comments and are ignored

    # this is a comment
  • lines beginning with a - are treated as a JVM option that applies independent of the version of the JVM

    -Xmx2g
  • lines beginning with a number followed by a : followed by a - are treated as a JVM option that applies only if the version of the JVM matches the number

    8:-Xmx2g
  • lines beginning with a number followed by a - followed by a : are treated as a JVM option that applies only if the version of the JVM is greater than or equal to the number

    8-:-Xmx2g
  • lines beginning with a number followed by a - followed by a number followed by a : are treated as a JVM option that applies only if the version of the JVM falls in the inclusive range of the two numbers

    8-9:-Xmx2g
  • all other lines are rejected

Setting the memory size

edit

The memory of the JVM executing Logstash can be divided in two zones: heap and off-heap memory. In the heap refers to Java heap, which contains all the Java objects created by Logstash during its operation, see Setting the JVM heap size for description on how to size it. What’s not part of the heap is named off-heap and consists of memory that can be used and controlled by Logstash, generally thread stacks, direct memory and memory mapped pages, check Setting the off-heap size for comprehensive descriptions. In off-heap space there is some space which is used by JVM and contains all the data structures functional to the execution of the virtual machine. This memory can’t be controlled by Logstash and the settings are rarely customized.

Setting the JVM heap size

edit

Here are some tips for adjusting the JVM heap size:

  • The recommended heap size for typical ingestion scenarios should be no less than 4GB and no more than 8GB.
  • CPU utilization can increase unnecessarily if the heap size is too low, resulting in the JVM constantly garbage collecting. You can check for this issue by doubling the heap size to see if performance improves.
  • Do not increase the heap size past the amount of physical memory. Some memory must be left to run the OS and other processes. As a general guideline for most installations, don’t exceed 50-75% of physical memory. The more memory you have, the higher percentage you can use.
  • Set the minimum (Xms) and maximum (Xmx) heap allocation size to the same value to prevent the heap from resizing at runtime, which is a very costly process.
  • You can make more accurate measurements of the JVM heap by using either the jmap command line utility distributed with Java or by using VisualVM. For more info, see Profiling the heap.

Setting the off-heap size

edit

The operating system, persistent queue mmap pages, direct memory, and other processes require memory in addition to memory allocated to heap size.

Internal JVM data structures, thread stacks, memory mapped files and direct memory for input/output (IO) operations are all parts of the off-heap JVM memory. Memory mapped files are not part of the Logstash’s process off-heap memory, but consume RAM when paging files from disk. These mapped files speed up the access to Persistent Queues pages, a performance improvement - or trade off - to reduce expensive disk operations such as read, write, and seek. Some network I/O operations also resort to in-process direct memory usage to avoid, for example, copying of buffers between network sockets. Input plugins such as Elastic Agent, Beats, TCP, and HTTP inputs, use direct memory. The zone for Thread stacks contains the list of stack frames for each Java thread created by the JVM; each frame keeps the local arguments passed during method calls. Read on Setting the JVM stack size if the size needs to be adapted to the processing needs.

Plugins, depending on their type (inputs, filters, and outputs), have different thread models. Every input plugin runs in its own thread and can potentially spawn others. For example, each JDBC input plugin launches a scheduler thread. Netty based plugins like TCP, Beats or HTTP input spawn a thread pool with 2 * number_of_cores threads. Output plugins may also start helper threads, such as a connection management thread for each Elasticsearch output instance. Every pipeline, also, has its own thread responsible to manage the pipeline lifecycle.

To summarize, we have 3 categories of memory usage, where 2 can be limited by the JVM and the other relies on available, free memory:

Memory Type Configured using Used by

JVM Heap

-Xmx

any normal object allocation

JVM direct memory

-XX:MaxDirectMemorySize

beats, tcp and http inputs

Native memory

N/A

Persistent Queue Pages, Thread Stacks

Keep these memory requirements in mind as you calculate your ideal memory allocation.

Upcoming changes to Buffer Allocation and Troubleshooting Out of Memory errors

edit

Plugins such as Elastic Agent, Beats, TCP, and HTTP inputs, currently default to using direct memory as it tends to provide better performance, especially when interacting with the network stack. Under heavy load, namely large number of connections and large messages, the direct memory space can be exhausted and lead to Out of Memory (OOM) errors in off-heap space.

An off-heap OOM is difficult to debug, so Logstash provides a pipeline.buffer.type setting in logstash.yml that lets you control where to allocate memory buffers for plugins that use them. Currently it is set to direct by default, but you can change it to heap to use Java heap space instead, which will be become the default in the future. When set to heap, buffer allocations used by plugins are configured to prefer the Java Heap instead of direct memory, as direct memory allocations may still be necessary depending on the plugin.

When set to "heap", in the event of an out-of-memory, Logstash will produce a heap dump to facilitate debugging.

It is important to note that the Java heap sizing requirements will be impacted by this change since allocations that previously resided on the direct memory will use heap instead.

Performance-wise there shouldn’t be a noticeable impact, since while direct memory IO is faster, Logstash Event objects produced by these plugins end up being allocated on the Java Heap, incurring the cost of copying from direct memory to heap memory regardless of the setting.

  • When you set pipeline.buffer.type to heap, consider incrementing the Java heap by the amount of memory that had been reserved for direct space.

Memory sizing

edit

Total JVM memory allocation must be estimated and is controlled indirectly using Java heap and direct memory settings. By default, a JVM’s off-heap direct memory limit is the same as the heap size. Check out beats input memory usage. Consider setting -XX:MaxDirectMemorySize to half of the heap size or any value that can accommodate the load you expect these plugins to handle.

As you make your capacity calculations, keep in mind that the JVM can’t consume the total amount of the host’s memory available, as the Operating System and other processes will require memory too.

For a Logstash instance with persistent queue (PQ) enabled on multiple pipelines, we could estimate memory consumption using:

pipelines number * (pipeline threads * stack size + 2 * PQ page size) + direct memory + Java heap

Each Persistent Queue requires that at least head and tail pages are present and accessible in memory. The default page size is 64 MB so each PQ requires at least 128 MB of heap memory, which can be a significant source of memory consumption per pipeline. Note that the size of memory mapped file can’t be limited with an upper bound.

Stack size is a setting that depends on the JVM used, but could be customized with -Xss setting.

Direct memory space by default is big as much as Java heap, but can be customized with the -XX:MaxDirectMemorySize setting.

Example

Consider a Logstash instance running 10 pipelines, with simple input and output plugins that doesn’t start additional threads, it has 1 pipelines thread, 1 input plugin thread and 12 workers, summing up to 14. Keep in mind that, by default, JVM allocates direct memory equal to memory allocated for Java heap.

The calculation results in:

  • native memory: 1.4Gb [derived from 10 * (14 * 1Mb + 128Mb)]
  • direct memory: 4Gb
  • Java heap: 4Gb

Setting the JVM stack size

edit

Large configurations may require additional JVM stack memory. If you see a stack overflow error, try increasing the JVM stack size. Add an entry similar to this one in the jvm.options settings file:

-Xss4M

Note that the default stack size is different per platform and per OS flavor. You can find out what the default is by running:

java -XX:+PrintFlagsFinal -version | grep ThreadStackSize

Depending on the default stack size, start by multiplying by 4x, then 8x, and then 16x until the overflow error resolves.

Using LS_JAVA_OPTS

edit

The LS_JAVA_OPTS environment variable can also be used to override JVM settings in the jvm.options file settings file. The content of this variable is additive to options configured in the jvm.options file, and will override any settings that exist in both places.

For example to set a different locale to launch Logstash instance:

LS_JAVA_OPTS="-Duser.country=DE -Duser.language=de" bin/logstash -e 'input { stdin { codec => json } }'