IMPORTANT: No additional bug fixes or documentation updates will be released for this version. For the latest information, see the current release documentation.

« Jmx input plugin Kinesis input plugin »

› ›

Kafka input plugin

edit

Kafka input plugin

edit

Plugin version: v8.3.1
Released on: 2018-12-19
Changelog

For other versions, see the Versioned plugin docs.

Installation

edit

For plugins not bundled by default, it is easy to install by running bin/logstash-plugin install logstash-input-kafka. See Working with plugins for more details.

Getting Help

edit

For questions about the plugin, open a topic in the Discuss forums. For bugs or feature requests, open an issue in Github. For the list of Elastic supported plugins, please consult the Elastic Support Matrix.

Description

edit

This input will read events from a Kafka topic.

This plugin uses Kafka Client 2.1.0. For broker compatibility, see the official Kafka compatibility reference. If the linked compatibility wiki is not up-to-date, please contact Kafka support/community to confirm compatibility.

If you require features not yet available in this plugin (including client version upgrades), please file an issue with details about what you need.

This input supports connecting to Kafka over:

SSL (requires plugin version 3.0.0 or later)
Kerberos SASL (requires plugin version 5.1.0 or later)

By default security is disabled but can be turned on as needed.

The Logstash Kafka consumer handles group management and uses the default offset management strategy using Kafka topics.

Logstash instances by default form a single logical group to subscribe to Kafka topics Each Logstash Kafka consumer can run multiple threads to increase read throughput. Alternatively, you could run multiple Logstash instances with the same group_id to spread the load across physical machines. Messages in a topic will be distributed to all Logstash instances with the same group_id.

Ideally you should have as many threads as the number of partitions for a perfect balance — more threads than partitions means that some threads will be idle

For more information see http://kafka.apache.org/documentation.html#theconsumer

Kafka consumer configuration: http://kafka.apache.org/documentation.html#consumerconfigs

Metadata fields

edit

The following metadata from Kafka broker are added under the [@metadata] field:

[@metadata][kafka][topic]: Original Kafka topic from where the message was consumed.
[@metadata][kafka][consumer_group]: Consumer group
[@metadata][kafka][partition]: Partition info for this message.
[@metadata][kafka][offset]: Original record offset for this message.
[@metadata][kafka][key]: Record key, if any.
[@metadata][kafka][timestamp]: Timestamp in the Record. Depending on your broker configuration, this can be either when the record was created (default) or when it was received by the broker. See more about property log.message.timestamp.type at https://kafka.apache.org/10/documentation.html#brokerconfigs

Please note that @metadata fields are not part of any of your events at output time. If you need these information to be inserted into your original event, you’ll have to use the mutate filter to manually copy the required fields into your event.

Kafka Input Configuration Options

edit

This plugin supports these configuration options plus the Common Options described later.

Some of these options map to a Kafka option. See the https://kafka.apache.org/documentation for more details.

Setting	Input type	Required
`auto_commit_interval_ms`	string	No
`auto_offset_reset`	string	No
`bootstrap_servers`	string	No
`check_crcs`	string	No
`client_id`	string	No
`connections_max_idle_ms`	string	No
`consumer_threads`	number	No
`decorate_events`	boolean	No
`enable_auto_commit`	string	No
`exclude_internal_topics`	string	No
`fetch_max_bytes`	string	No
`fetch_max_wait_ms`	string	No
`fetch_min_bytes`	string	No
`group_id`	string	No
`heartbeat_interval_ms`	string	No
`jaas_path`	a valid filesystem path	No
`kerberos_config`	a valid filesystem path	No
`key_deserializer_class`	string	No
`max_partition_fetch_bytes`	string	No
`max_poll_interval_ms`	string	No
`max_poll_records`	string	No
`metadata_max_age_ms`	string	No
`partition_assignment_strategy`	string	No
`poll_timeout_ms`	number	No
`receive_buffer_bytes`	string	No
`reconnect_backoff_ms`	string	No
`request_timeout_ms`	string	No
`retry_backoff_ms`	string	No
`sasl_kerberos_service_name`	string	No
`sasl_mechanism`	string	No
`security_protocol`	string, one of `["PLAINTEXT", "SSL", "SASL_PLAINTEXT", "SASL_SSL"]`	No
`send_buffer_bytes`	string	No
`session_timeout_ms`	string	No
`ssl_endpoint_identification_algorithm`	string	No
`ssl_key_password`	password	No
`ssl_keystore_location`	a valid filesystem path	No
`ssl_keystore_password`	password	No
`ssl_keystore_type`	string	No
`ssl_truststore_location`	a valid filesystem path	No
`ssl_truststore_password`	password	No
`ssl_truststore_type`	string	No
`topics`	array	No
`topics_pattern`	string	No
`value_deserializer_class`	string	No

Also see Common Options for a list of options supported by all input plugins.

`auto_commit_interval_ms`

edit

Value type is string
Default value is "5000"

The frequency in milliseconds that the consumer offsets are committed to Kafka.

`auto_offset_reset`

edit

Value type is string
There is no default value for this setting.

What to do when there is no initial offset in Kafka or if an offset is out of range:

earliest: automatically reset the offset to the earliest offset
latest: automatically reset the offset to the latest offset
none: throw exception to the consumer if no previous offset is found for the consumer’s group
anything else: throw exception to the consumer.

`bootstrap_servers`

edit

Value type is string
Default value is "localhost:9092"

A list of URLs of Kafka instances to use for establishing the initial connection to the cluster. This list should be in the form of host1:port1,host2:port2 These urls are just used for the initial connection to discover the full cluster membership (which may change dynamically) so this list need not contain the full set of servers (you may want more than one, though, in case a server is down).

`check_crcs`

edit

Value type is string
There is no default value for this setting.

Automatically check the CRC32 of the records consumed. This ensures no on-the-wire or on-disk corruption to the messages occurred. This check adds some overhead, so it may be disabled in cases seeking extreme performance.

`client_id`

edit

Value type is string
Default value is "logstash"

The id string to pass to the server when making requests. The purpose of this is to be able to track the source of requests beyond just ip/port by allowing a logical application name to be included.

`connections_max_idle_ms`

edit

Value type is string
There is no default value for this setting.

Close idle connections after the number of milliseconds specified by this config.

`consumer_threads`

edit

Value type is number
Default value is 1

Ideally you should have as many threads as the number of partitions for a perfect balance — more threads than partitions means that some threads will be idle

`decorate_events`

edit

Value type is boolean
Default value is false

Option to add Kafka metadata like topic, message size to the event. This will add a field named kafka to the logstash event containing the following attributes:

topic: The topic this message is associated with
consumer_group: The consumer group used to read in this event
partition: The partition this message is associated with
offset: The offset from the partition this message is associated with
key: A ByteBuffer containing the message key

`enable_auto_commit`

edit

Value type is string
Default value is "true"

If true, periodically commit to Kafka the offsets of messages already returned by the consumer. This committed offset will be used when the process fails as the position from which the consumption will begin.

`exclude_internal_topics`

edit

Value type is string
There is no default value for this setting.

Whether records from internal topics (such as offsets) should be exposed to the consumer. If set to true the only way to receive records from an internal topic is subscribing to it.

`fetch_max_bytes`

edit

Value type is string
There is no default value for this setting.

The maximum amount of data the server should return for a fetch request. This is not an absolute maximum, if the first message in the first non-empty partition of the fetch is larger than this value, the message will still be returned to ensure that the consumer can make progress.

`fetch_max_wait_ms`

edit

Value type is string
There is no default value for this setting.

The maximum amount of time the server will block before answering the fetch request if there isn’t sufficient data to immediately satisfy fetch_min_bytes. This should be less than or equal to the timeout used in poll_timeout_ms

`fetch_min_bytes`

edit

Value type is string
There is no default value for this setting.

The minimum amount of data the server should return for a fetch request. If insufficient data is available the request will wait for that much data to accumulate before answering the request.

`group_id`

edit

Value type is string
Default value is "logstash"

The identifier of the group this consumer belongs to. Consumer group is a single logical subscriber that happens to be made up of multiple processors. Messages in a topic will be distributed to all Logstash instances with the same group_id

`heartbeat_interval_ms`

edit

Value type is string
There is no default value for this setting.

The expected time between heartbeats to the consumer coordinator. Heartbeats are used to ensure that the consumer’s session stays active and to facilitate rebalancing when new consumers join or leave the group. The value must be set lower than session.timeout.ms, but typically should be set no higher than 1/3 of that value. It can be adjusted even lower to control the expected time for normal rebalances.

`jaas_path`

edit

Value type is path
There is no default value for this setting.

The Java Authentication and Authorization Service (JAAS) API supplies user authentication and authorization services for Kafka. This setting provides the path to the JAAS file. Sample JAAS file for Kafka client:

KafkaClient {
  com.sun.security.auth.module.Krb5LoginModule required
  useTicketCache=true
  renewTicket=true
  serviceName="kafka";
  };

Please note that specifying jaas_path and kerberos_config in the config file will add these to the global JVM system properties. This means if you have multiple Kafka inputs, all of them would be sharing the same jaas_path and kerberos_config. If this is not desirable, you would have to run separate instances of Logstash on different JVM instances.

`kerberos_config`

edit

Value type is path
There is no default value for this setting.

Optional path to kerberos config file. This is krb5.conf style as detailed in https://web.mit.edu/kerberos/krb5-1.12/doc/admin/conf_files/krb5_conf.html

`key_deserializer_class`

edit

Value type is string
Default value is "org.apache.kafka.common.serialization.StringDeserializer"

Java Class used to deserialize the record’s key

`max_partition_fetch_bytes`

edit

Value type is string
There is no default value for this setting.

The maximum amount of data per-partition the server will return. The maximum total memory used for a request will be #partitions * max.partition.fetch.bytes. This size must be at least as large as the maximum message size the server allows or else it is possible for the producer to send messages larger than the consumer can fetch. If that happens, the consumer can get stuck trying to fetch a large message on a certain partition.

`max_poll_interval_ms`

edit

Value type is string
There is no default value for this setting.

The maximum delay between invocations of poll() when using consumer group management. This places an upper bound on the amount of time that the consumer can be idle before fetching more records. If poll() is not called before expiration of this timeout, then the consumer is considered failed and the group will rebalance in order to reassign the partitions to another member. The value of the configuration request_timeout_ms must always be larger than max_poll_interval_ms

`max_poll_records`

edit

Value type is string
There is no default value for this setting.

The maximum number of records returned in a single call to poll().

`metadata_max_age_ms`

edit

Value type is string
There is no default value for this setting.

The period of time in milliseconds after which we force a refresh of metadata even if we haven’t seen any partition leadership changes to proactively discover any new brokers or partitions

`partition_assignment_strategy`

edit

Value type is string
There is no default value for this setting.

The class name of the partition assignment strategy that the client uses to distribute partition ownership amongst consumer instances. Maps to the Kafka partition.assignment.strategy setting, which defaults to org.apache.kafka.clients.consumer.RangeAssignor.

`poll_timeout_ms`

edit

Value type is number
Default value is 100

Time kafka consumer will wait to receive new messages from topics

`receive_buffer_bytes`

edit

Value type is string
There is no default value for this setting.

The size of the TCP receive buffer (SO_RCVBUF) to use when reading data.

`reconnect_backoff_ms`

edit

Value type is string
There is no default value for this setting.

The amount of time to wait before attempting to reconnect to a given host. This avoids repeatedly connecting to a host in a tight loop. This backoff applies to all requests sent by the consumer to the broker.

`request_timeout_ms`

edit

Value type is string
There is no default value for this setting.

The configuration controls the maximum amount of time the client will wait for the response of a request. If the response is not received before the timeout elapses the client will resend the request if necessary or fail the request if retries are exhausted.

`retry_backoff_ms`

edit

Value type is string
There is no default value for this setting.

The amount of time to wait before attempting to retry a failed fetch request to a given topic partition. This avoids repeated fetching-and-failing in a tight loop.

`sasl_kerberos_service_name`

edit

Value type is string
There is no default value for this setting.

The Kerberos principal name that Kafka broker runs as. This can be defined either in Kafka’s JAAS config or in Kafka’s config.

`sasl_mechanism`

edit

Value type is string
Default value is "GSSAPI"

SASL mechanism used for client connections. This may be any mechanism for which a security provider is available. GSSAPI is the default mechanism.

`security_protocol`

edit

Value can be any of: PLAINTEXT, SSL, SASL_PLAINTEXT, SASL_SSL
Default value is "PLAINTEXT"

Security protocol to use, which can be either of PLAINTEXT,SSL,SASL_PLAINTEXT,SASL_SSL

`send_buffer_bytes`

edit

Value type is string
There is no default value for this setting.

The size of the TCP send buffer (SO_SNDBUF) to use when sending data

`session_timeout_ms`

edit

Value type is string
There is no default value for this setting.

The timeout after which, if the poll_timeout_ms is not invoked, the consumer is marked dead and a rebalance operation is triggered for the group identified by group_id

`ssl_endpoint_identification_algorithm`

edit

Value type is string
Default value is "https"

The endpoint identification algorithm, defaults to "https". Set to empty string "" to disable endpoint verification

`ssl_key_password`

edit

Value type is password
There is no default value for this setting.

The password of the private key in the key store file.

`ssl_keystore_location`

edit

Value type is path
There is no default value for this setting.

If client authentication is required, this setting stores the keystore path.

`ssl_keystore_password`

edit

Value type is password
There is no default value for this setting.

If client authentication is required, this setting stores the keystore password

`ssl_keystore_type`

edit

Value type is string
There is no default value for this setting.

The keystore type.

`ssl_truststore_location`

edit

Value type is path
There is no default value for this setting.

The JKS truststore path to validate the Kafka broker’s certificate.

`ssl_truststore_password`

edit

Value type is password
There is no default value for this setting.

The truststore password

`ssl_truststore_type`

edit

Value type is string
There is no default value for this setting.

The truststore type.

`topics`

edit

Value type is array
Default value is ["logstash"]

A list of topics to subscribe to, defaults to ["logstash"].

`topics_pattern`

edit

Value type is string
There is no default value for this setting.

A topic regex pattern to subscribe to. The topics configuration will be ignored when using this configuration.

`value_deserializer_class`

edit

Value type is string
Default value is "org.apache.kafka.common.serialization.StringDeserializer"

Java Class used to deserialize the record’s value

Common Options

edit

The following configuration options are supported by all input plugins:

Setting	Input type	Required
`add_field`	hash	No
`codec`	codec	No
`enable_metric`	boolean	No
`id`	string	No
`tags`	array	No
`type`	string	No

Details

edit

`add_field`

edit

Value type is hash
Default value is {}

Add a field to an event

`codec`

edit

Value type is codec
Default value is "plain"

The codec used for input data. Input codecs are a convenient method for decoding your data before it enters the input, without needing a separate filter in your Logstash pipeline.

`enable_metric`

edit

Value type is boolean
Default value is true

Disable or enable metric logging for this specific plugin instance by default we record all the metrics we can, but you can disable metrics collection for a specific plugin.

`id`

edit

Value type is string
There is no default value for this setting.

Add a unique ID to the plugin configuration. If no ID is specified, Logstash will generate one. It is strongly recommended to set this ID in your configuration. This is particularly useful when you have two or more plugins of the same type, for example, if you have 2 kafka inputs. Adding a named ID in this case will help in monitoring Logstash when using the monitoring APIs.

input {
  kafka {
    id => "my_plugin_id"
  }
}

`tags`

edit

Value type is array
There is no default value for this setting.

Add any number of arbitrary tags to your event.

This can help with processing later.

`type`

edit

Value type is string
There is no default value for this setting.

Add a type field to all events handled by this input.

Types are used mainly for filter activation.

The type is stored as part of the event itself, so you can also use the type to search for it in Kibana.

If you try to set a type on an event that already has one (for example when you send an event from a shipper to an indexer) then a new input will not override the existing type. A type set at the shipper stays with that event for its life even when sent to another Logstash server.

« Jmx input plugin Kinesis input plugin »

Was this helpful?

Feedback

The Search AI Company

Generative AI

Search

Security

Observability

By solution

Industries

Kafka input plugin

Kafka input plugin

Installation

Getting Help

Description

Metadata fields

Kafka Input Configuration Options

auto_commit_interval_ms

auto_offset_reset

bootstrap_servers

check_crcs

client_id

connections_max_idle_ms

consumer_threads

decorate_events

enable_auto_commit

exclude_internal_topics

fetch_max_bytes

fetch_max_wait_ms

fetch_min_bytes

group_id

heartbeat_interval_ms

jaas_path

kerberos_config

key_deserializer_class

max_partition_fetch_bytes

max_poll_interval_ms

max_poll_records

metadata_max_age_ms

partition_assignment_strategy

poll_timeout_ms

receive_buffer_bytes

reconnect_backoff_ms

request_timeout_ms

retry_backoff_ms

sasl_kerberos_service_name

sasl_mechanism

security_protocol

send_buffer_bytes

session_timeout_ms

ssl_endpoint_identification_algorithm

ssl_key_password

ssl_keystore_location

ssl_keystore_password

ssl_keystore_type

ssl_truststore_location

ssl_truststore_password

ssl_truststore_type

topics

topics_pattern

value_deserializer_class

Common Options

Details

add_field

codec

enable_metric

id

tags

type

Follow us

About us

Join us

Partners

Trust & Security

Investor relations

Excellence Awards

`auto_commit_interval_ms`

`auto_offset_reset`

`bootstrap_servers`

`check_crcs`

`client_id`

`connections_max_idle_ms`

`consumer_threads`

`decorate_events`

`enable_auto_commit`

`exclude_internal_topics`

`fetch_max_bytes`

`fetch_max_wait_ms`

`fetch_min_bytes`

`group_id`

`heartbeat_interval_ms`

`jaas_path`

`kerberos_config`

`key_deserializer_class`

`max_partition_fetch_bytes`

`max_poll_interval_ms`

`max_poll_records`

`metadata_max_age_ms`

`partition_assignment_strategy`

`poll_timeout_ms`

`receive_buffer_bytes`

`reconnect_backoff_ms`

`request_timeout_ms`

`retry_backoff_ms`

`sasl_kerberos_service_name`

`sasl_mechanism`

`security_protocol`

`send_buffer_bytes`

`session_timeout_ms`

`ssl_endpoint_identification_algorithm`

`ssl_key_password`

`ssl_keystore_location`

`ssl_keystore_password`

`ssl_keystore_type`

`ssl_truststore_location`

`ssl_truststore_password`

`ssl_truststore_type`

`topics`

`topics_pattern`

`value_deserializer_class`

`add_field`

`codec`

`enable_metric`

`id`

`tags`

`type`