Logstash Reference: other versions:
Logstash Introduction
Getting Started with Logstash
- Installing Logstash
- Stashing Your First Event
- Parsing Logs with Logstash
- Stitching Together Multiple Input and Output Plugins
How Logstash Works
- Execution Model
Setting Up and Running Logstash
- Logstash Directory Layout
- Logstash Configuration Files
- logstash.yml
- Secrets keystore for secure settings
- Running Logstash from the Command Line
- Running Logstash as a Service on Debian or RPM
- Running Logstash on Docker
- Configuring Logstash for Docker
- Running Logstash on Windows
- Logging
- Shutting Down Logstash
- Installing X-Pack
- Setting Up X-Pack
Upgrading Logstash
- Upgrading Using Package Managers
- Upgrading Using a Direct Download
- Upgrading Logstash to 6.0
- Upgrading with the Persistent Queue Enabled
Configuring Logstash
- Structure of a Config File
- Accessing Event Data and Fields in the Configuration
- Using Environment Variables in the Configuration
- Logstash Configuration Examples
- Multiple Pipelines
- Pipeline-to-Pipeline Communication (Beta)
- Reloading the Config File
- Managing Multiline Events
- Glob Pattern Support
- Converting Ingest Node Pipelines
- Logstash-to-Logstash Communication
- Centralized Pipeline Management
- X-Pack monitoring
- X-Pack security
- X-Pack Settings
Managing Logstash
- Centralized Pipeline Management
Working with Logstash Modules
- Using Elastic Cloud
- ArcSight Module
- Netflow Module
- Azure Module
Working with Filebeat Modules
- Use ingest pipelines for parsing
- Use Logstash pipelines for parsing
- Example: Set up Filebeat modules to work with Kafka and Logstash
Data Resiliency
- Persistent Queues
- Dead Letter Queues
Transforming Data
- Performing Core Operations
- Deserializing Data
- Extracting Fields and Wrangling Data
- Enriching Data with Lookups
Deploying and Scaling Logstash
Performance Tuning
- Performance Troubleshooting Guide
- Tuning and Profiling Logstash Performance
Monitoring Logstash
- Overview
- Monitoring UI
- Pipeline Viewer UI
- Troubleshooting
Monitoring APIs
- Node Info API
- Plugins Info API
- Node Stats API
- Hot Threads API
Working with plugins
- Generating Plugins
- Offline Plugin Management
- Private Gem Repositories
- Event API
Input plugins
- azure_event_hubs
- beats
- cloudwatch
- couchdb_changes
- dead_letter_queue
- elasticsearch
- exec
- file
- ganglia
- gelf
- generator
- github
- google_pubsub
- graphite
- heartbeat
- http
- http_poller
- imap
- irc
- jdbc
- jms
- jmx
- kafka
- kinesis
- log4j
- lumberjack
- meetup
- pipe
- puppet_facter
- rabbitmq
- redis
- relp
- rss
- s3
- salesforce
- snmp
- snmptrap
- sqlite
- sqs
- stdin
- stomp
- syslog
- tcp
- twitter
- udp
- unix
- varnishlog
- websocket
- wmi
- xmpp
Output plugins
- boundary
- circonus
- cloudwatch
- csv
- datadog
- datadog_metrics
- elastic_app_search
- elasticsearch
- email
- exec
- file
- ganglia
- gelf
- google_bigquery
- google_pubsub
- graphite
- graphtastic
- http
- influxdb
- irc
- juggernaut
- kafka
- librato
- loggly
- lumberjack
- metriccatcher
- mongodb
- nagios
- nagios_nsca
- opentsdb
- pagerduty
- pipe
- rabbitmq
- redis
- redmine
- riak
- riemann
- s3
- sns
- solr_http
- sqs
- statsd
- stdout
- stomp
- syslog
- tcp
- timber
- udp
- webhdfs
- websocket
- xmpp
- zabbix
Filter plugins
- aggregate
- alter
- cidr
- cipher
- clone
- csv
- date
- de_dot
- dissect
- dns
- drop
- elapsed
- elasticsearch
- environment
- extractnumbers
- fingerprint
- geoip
- grok
- http
- i18n
- jdbc_static
- jdbc_streaming
- json
- json_encode
- kv
- memcached
- metricize
- metrics
- mutate
- prune
- range
- ruby
- sleep
- split
- syslog_pri
- threats_classifier
- throttle
- tld
- translate
- truncate
- urldecode
- useragent
- uuid
- xml
Codec plugins
- avro
- cef
- cloudfront
- cloudtrail
- collectd
- dots
- edn
- edn_lines
- es_bulk
- fluent
- graphite
- gzip_lines
- json
- json_lines
- line
- msgpack
- multiline
- netflow
- nmap
- plain
- protobuf
- rubydebug
Tips and Best Practices
Troubleshooting Common Problems
Contributing to Logstash
- How to write a Logstash input plugin
- How to write a Logstash codec plugin
- How to write a Logstash filter plugin
- How to write a Logstash output plugin
- Documenting your plugin
- Contributing a Patch to a Logstash Plugin
- Logstash Plugins Community Maintainer Guide
- Submitting your plugin to RubyGems.org and the logstash-plugins repository
Contributing a Java Plugin
- How to write a Java input plugin
- How to write a Java codec plugin
- How to write a Java filter plugin
- How to write a Java output plugin
Glossary of Terms
Breaking Changes
Release Notes
- Logstash 6.7.2 Release Notes
- Logstash 6.7.1 Release Notes
- Logstash 6.7.0 Release Notes
- Logstash 6.6.2 Release Notes
- Logstash 6.6.1 Release Notes
- Logstash 6.6.0 Release Notes
- Logstash 6.5.4 Release Notes
- Logstash 6.5.3 Release Notes
- Logstash 6.5.2 Release Notes
- Logstash 6.5.1 Release Notes
- Logstash 6.5.0 Release Notes
- Logstash 6.4.3 Release Notes
- Logstash 6.4.2 Release Notes
- Logstash 6.4.1 Release Notes
- Logstash 6.4.0 Release Notes
- Logstash 6.3.2 Release Notes
- Logstash 6.3.1 Release Notes
- Logstash 6.3.0 Release Notes
- Logstash 6.2.4 Release Notes
- Logstash 6.2.3 Release Notes
- Logstash 6.2.2 Release Notes
- Logstash 6.2.1 Release Notes
- Logstash 6.2.0 Release Notes
- Logstash 6.1.3 Release Notes
- Logstash 6.1.2 Release Notes
- Logstash 6.1.1 Release Notes
- Logstash 6.1.0 Release Notes

IMPORTANT: No additional bug fixes or documentation updates will be released for this version. For the latest information, see the current release documentation.

« Udp output plugin Websocket output plugin »

› ›

Webhdfs output plugin

edit

Webhdfs output plugin

edit

Plugin version: v3.0.6
Released on: 2018-04-06
Changelog

For other versions, see the Versioned plugin docs.

Getting Help

edit

For questions about the plugin, open a topic in the Discuss forums. For bugs or feature requests, open an issue in Github. For the list of Elastic supported plugins, please consult the Elastic Support Matrix.

Description

edit

This plugin sends Logstash events into files in HDFS via the webhdfs REST API.

Dependencies

edit

This plugin has no dependency on jars from hadoop, thus reducing configuration and compatibility problems. It uses the webhdfs gem from Kazuki Ohta and TAGOMORI Satoshi (@see: https://github.com/kzk/webhdfs). Optional dependencies are zlib and snappy gem if you use the compression functionality.

Operational Notes

edit

If you get an error like:

Max write retries reached. Exception: initialize: name or service not known {:level=>:error}

make sure that the hostname of your namenode is resolvable on the host running Logstash. When creating/appending to a file, webhdfs somtime sends a 307 TEMPORARY_REDIRECT with the HOSTNAME of the machine its running on.

Usage

edit

This is an example of Logstash config:

input {
  ...
}
filter {
  ...
}
output {
  webhdfs {
    host => "127.0.0.1"                 # (required)
    port => 50070                       # (optional, default: 50070)
    path => "/user/logstash/dt=%{+YYYY-MM-dd}/logstash-%{+HH}.log"  # (required)
    user => "hue"                       # (required)
  }
}

Webhdfs Output Configuration Options

edit

This plugin supports the following configuration options plus the Common Options described later.

Setting	Input type	Required
`compression`	string, one of `["none", "snappy", "gzip"]`	No
`flush_size`	number	No
`host`	string	Yes
`idle_flush_time`	number	No
`kerberos_keytab`	string	No
`open_timeout`	number	No
`path`	string	Yes
`port`	number	No
`read_timeout`	number	No
`retry_interval`	number	No
`retry_known_errors`	boolean	No
`retry_times`	number	No
`single_file_per_thread`	boolean	No
`snappy_bufsize`	number	No
`snappy_format`	string, one of `["stream", "file"]`	No
`ssl_cert`	string	No
`ssl_key`	string	No
`standby_host`	string	No
`standby_port`	number	No
`use_httpfs`	boolean	No
`use_kerberos_auth`	boolean	No
`use_ssl_auth`	boolean	No
`user`	string	Yes

Also see Common Options for a list of options supported by all output plugins.

`compression`

edit

Value can be any of: none, snappy, gzip
Default value is "none"

Compress output. One of [none, snappy, gzip]

`flush_size`

edit

Value type is number
Default value is 500

Sending data to webhdfs if event count is above, even if store_interval_in_secs is not reached.

`host`

edit

This is a required setting.
Value type is string
There is no default value for this setting.

The server name for webhdfs/httpfs connections.

`idle_flush_time`

edit

Value type is number
Default value is 1

Sending data to webhdfs in x seconds intervals.

`kerberos_keytab`

edit

Value type is string
There is no default value for this setting.

Set kerberos keytab file. Note that the gssapi library needs to be available to use this.

`open_timeout`

edit

Value type is number
Default value is 30

WebHdfs open timeout, default 30s.

`path`

edit

This is a required setting.
Value type is string
There is no default value for this setting.

The path to the file to write to. Event fields can be used here, as well as date fields in the joda time format, e.g.: /user/logstash/dt=%{+YYYY-MM-dd}/%{@source_host}-%{+HH}.log

`port`

edit

Value type is number
Default value is 50070

The server port for webhdfs/httpfs connections.

`read_timeout`

edit

Value type is number
Default value is 30

The WebHdfs read timeout, default 30s.

`retry_interval`

edit

Value type is number
Default value is 0.5

How long should we wait between retries.

`retry_known_errors`

edit

Value type is boolean
Default value is true

Retry some known webhdfs errors. These may be caused by race conditions when appending to same file, etc.

`retry_times`

edit

Value type is number
Default value is 5

How many times should we retry. If retry_times is exceeded, an error will be logged and the event will be discarded.

`single_file_per_thread`

edit

Value type is boolean
Default value is false

Avoid appending to same file in multiple threads. This solves some problems with multiple logstash output threads and locked file leases in webhdfs. If this option is set to true, %{[@metadata][thread_id]} needs to be used in path config settting.

`snappy_bufsize`

edit

Value type is number
Default value is 32768

Set snappy chunksize. Only neccessary for stream format. Defaults to 32k. Max is 65536 @see http://code.google.com/p/snappy/source/browse/trunk/framing_format.txt

`snappy_format`

edit

Value can be any of: stream, file
Default value is "stream"

Set snappy format. One of "stream", "file". Set to stream to be hive compatible.

`ssl_cert`

edit

Value type is string
There is no default value for this setting.

Set ssl cert file.

`ssl_key`

edit

Value type is string
There is no default value for this setting.

Set ssl key file.

`standby_host`

edit

Value type is string
Default value is false

Standby namenode for ha hdfs.

`standby_port`

edit

Value type is number
Default value is 50070

Standby namenode port for ha hdfs.

`use_httpfs`

edit

Value type is boolean
Default value is false

Use httpfs mode if set to true, else webhdfs.

`use_kerberos_auth`

edit

Value type is boolean
Default value is false

Set kerberos authentication.

`use_ssl_auth`

edit

Value type is boolean
Default value is false

Set ssl authentication. Note that the openssl library needs to be available to use this.

`user`

edit

This is a required setting.
Value type is string
There is no default value for this setting.

The Username for webhdfs.

Common Options

edit

The following configuration options are supported by all output plugins:

Setting	Input type	Required
`codec`	codec	No
`enable_metric`	boolean	No
`id`	string	No

`codec`

edit

Value type is codec
Default value is "line"

The codec used for output data. Output codecs are a convenient method for encoding your data before it leaves the output without needing a separate filter in your Logstash pipeline.

`enable_metric`

edit

Value type is boolean
Default value is true

Disable or enable metric logging for this specific plugin instance. By default we record all the metrics we can, but you can disable metrics collection for a specific plugin.

`id`

edit

Value type is string
There is no default value for this setting.

Add a unique ID to the plugin configuration. If no ID is specified, Logstash will generate one. It is strongly recommended to set this ID in your configuration. This is particularly useful when you have two or more plugins of the same type. For example, if you have 2 webhdfs outputs. Adding a named ID in this case will help in monitoring Logstash when using the monitoring APIs.

output {
  webhdfs {
    id => "my_plugin_id"
  }
}

« Udp output plugin Websocket output plugin »

Was this helpful?

Feedback

The Search AI Company

Generative AI

Search

Security

Observability

By solution

Industries

Webhdfs output plugin

Webhdfs output plugin

Getting Help

Description

Dependencies

Operational Notes

Usage

Webhdfs Output Configuration Options

compression

flush_size

host

idle_flush_time

kerberos_keytab

open_timeout

path

port

read_timeout

retry_interval

retry_known_errors

retry_times

single_file_per_thread

snappy_bufsize

snappy_format

ssl_cert

ssl_key

standby_host

standby_port

use_httpfs

use_kerberos_auth

use_ssl_auth

user

Common Options

codec

enable_metric

id

Follow us

About us

Join us

Partners

Trust & Security

Investor relations

Excellence Awards

`compression`

`flush_size`

`host`

`idle_flush_time`

`kerberos_keytab`

`open_timeout`

`path`

`port`

`read_timeout`

`retry_interval`

`retry_known_errors`

`retry_times`

`single_file_per_thread`

`snappy_bufsize`

`snappy_format`

`ssl_cert`

`ssl_key`

`standby_host`

`standby_port`

`use_httpfs`

`use_kerberos_auth`

`use_ssl_auth`

`user`

`codec`

`enable_metric`

`id`