- Logstash Reference: other versions:
- Logstash Introduction
- Getting Started with Logstash
- How Logstash Works
- Setting Up and Running Logstash
- Breaking changes
- Upgrading Logstash
- Configuring Logstash
- Performance Troubleshooting Guide
- Monitoring APIs
- Working with plugins
- Input plugins
- beats
- cloudwatch
- couchdb_changes
- drupal_dblog
- elasticsearch
- eventlog
- exec
- file
- ganglia
- gelf
- gemfire
- generator
- github
- graphite
- heartbeat
- heroku
- http
- http_poller
- imap
- irc
- jdbc
- jmx
- kafka
- kinesis
- log4j
- lumberjack
- meetup
- pipe
- puppet_facter
- rabbitmq
- rackspace
- redis
- relp
- rss
- s3
- salesforce
- snmptrap
- sqlite
- sqs
- stdin
- stomp
- syslog
- tcp
- udp
- unix
- varnishlog
- websocket
- wmi
- xmpp
- zenoss
- zeromq
- Output plugins
- boundary
- circonus
- cloudwatch
- csv
- datadog
- datadog_metrics
- elasticsearch
- exec
- file
- ganglia
- gelf
- google_bigquery
- google_cloud_storage
- graphite
- graphtastic
- hipchat
- http
- influxdb
- irc
- jira
- juggernaut
- kafka
- librato
- loggly
- lumberjack
- metriccatcher
- mongodb
- nagios
- nagios_nsca
- newrelic
- opentsdb
- pagerduty
- pipe
- rabbitmq
- rackspace
- redis
- redmine
- riak
- riemann
- s3
- sns
- solr_http
- sqs
- statsd
- stdout
- stomp
- syslog
- tcp
- udp
- webhdfs
- websocket
- xmpp
- zabbix
- zeromq
- Filter plugins
- aggregate
- alter
- anonymize
- cidr
- cipher
- clone
- collate
- csv
- date
- de_dot
- dissect
- dns
- drop
- elapsed
- elasticsearch
- environment
- extractnumbers
- fingerprint
- geoip
- grok
- i18n
- json
- json_encode
- kv
- metaevent
- metricize
- metrics
- mutate
- oui
- prune
- punct
- range
- ruby
- sleep
- split
- syslog_pri
- throttle
- tld
- translate
- urldecode
- useragent
- uuid
- xml
- yaml
- zeromq
- Codec plugins
- Contributing to Logstash
- How to write a Logstash input plugin
- How to write a Logstash input plugin
- How to write a Logstash codec plugin
- How to write a Logstash filter plugin
- Contributing a Patch to a Logstash Plugin
- Logstash Plugins Community Maintainer Guide
- Submitting your plugin to RubyGems.org and the logstash-plugins repository
- Glossary of Terms
- Release Notes
editThis is a community-maintained plugin! It does not ship with Logstash by default, but it is easy to install by running bin/logstash-plugin install logstash-output-webhdfs
This plugin sends Logstash events into files in HDFS via the webhdfs REST API.
editThis plugin has no dependency on jars from hadoop, thus reducing configuration and compatibility problems. It uses the webhdfs gem from Kazuki Ohta and TAGOMORI Satoshi (@see: https://github.com/kzk/webhdfs). Optional dependencies are zlib and snappy gem if you use the compression functionality.
Operational Notes
editIf you get an error like:
Max write retries reached. Exception: initialize: name or service not known {:level=>:error}
make sure that the hostname of your namenode is resolvable on the host running Logstash. When creating/appending
to a file, webhdfs somtime sends a 307 TEMPORARY_REDIRECT
with the HOSTNAME
of the machine its running on.
editThis is an example of Logstash config:
input { ... } filter { ... } output { webhdfs { server => "" # (required) path => "/user/logstash/dt=%{+YYYY-MM-dd}/logstash-%{+HH}.log" # (required) user => "hue" # (required) } }
editThis plugin supports the following configuration options:
Required configuration options:
webhdfs { host => ... path => ... user => ... }
Available configuration options:
edit- Value type is codec
Default value is
The codec used for output data. Output codecs are a convenient method for encoding your data before it leaves the output, without needing a separate filter in your Logstash pipeline.
Value can be any of:
Default value is
Compress output. One of [none, snappy, gzip]
edit- DEPRECATED WARNING: This configuration item is deprecated and may not be available in future versions.
- Value type is array
Default value is
Only handle events without any of these tags. Optional.
edit- Value type is number
Default value is
Sending data to webhdfs if event count is above, even if store_interval_in_secs
is not reached.
edit- This is a required setting.
- Value type is string
- There is no default value for this setting.
The server name for webhdfs/httpfs connections.
edit- Value type is number
Default value is
Sending data to webhdfs in x seconds intervals.
edit- Value type is string
- There is no default value for this setting.
The format to use when writing events to the file. This value
supports any string and can include %{name}
and other dynamic
If this setting is omitted, the full json representation of the event will be written as a single line.
edit- This is a required setting.
- Value type is string
- There is no default value for this setting.
The path to the file to write to. Event fields can be used here,
as well as date fields in the joda time format, e.g.:
edit- Value type is number
Default value is
The server port for webhdfs/httpfs connections.
edit- Value type is number
Default value is
How long should we wait between retries.
edit- Value type is boolean
Default value is
Retry some known webhdfs errors. These may be caused by race conditions when appending to same file, etc.
edit- Value type is number
Default value is
How many times should we retry. If retry_times is exceeded, an error will be logged and the event will be discarded.
edit- Value type is number
Default value is
Set snappy chunksize. Only neccessary for stream format. Defaults to 32k. Max is 65536 @see http://code.google.com/p/snappy/source/browse/trunk/framing_format.txt
Value can be any of:
Default value is
Set snappy format. One of "stream", "file". Set to stream to be hive compatible.
edit- DEPRECATED WARNING: This configuration item is deprecated and may not be available in future versions.
- Value type is array
Default value is
Only handle events with all of these tags. Optional.
edit- DEPRECATED WARNING: This configuration item is deprecated and may not be available in future versions.
- Value type is string
Default value is
The type to act on. If a type is given, then this output will only
act on messages with the same type. See any input plugin’s type
attribute for more.
edit- Value type is boolean
Default value is
Use httpfs mode if set to true, else webhdfs.
On this page