- Logstash Reference: other versions:
- Logstash Introduction
- Getting Started with Logstash
- How Logstash Works
- Setting Up and Running Logstash
- Logstash Directory Layout
- Logstash Configuration Files
- logstash.yml
- Secrets keystore for secure settings
- Running Logstash from the Command Line
- Running Logstash as a Service on Debian or RPM
- Running Logstash on Docker
- Configuring Logstash for Docker
- Running Logstash on Windows
- Logging
- Shutting Down Logstash
- Installing X-Pack
- Setting Up X-Pack
- Upgrading Logstash
- Configuring Logstash
- Structure of a Config File
- Accessing Event Data and Fields in the Configuration
- Using Environment Variables in the Configuration
- Logstash Configuration Examples
- Multiple Pipelines
- Pipeline-to-Pipeline Communication (Beta)
- Reloading the Config File
- Managing Multiline Events
- Glob Pattern Support
- Converting Ingest Node Pipelines
- Logstash-to-Logstash Communication
- Centralized Pipeline Management
- X-Pack monitoring
- X-Pack security
- X-Pack Settings
- Managing Logstash
- Working with Logstash Modules
- Working with Filebeat Modules
- Data Resiliency
- Transforming Data
- Deploying and Scaling Logstash
- Performance Tuning
- Monitoring Logstash
- Monitoring APIs
- Working with plugins
- Input plugins
- azure_event_hubs
- beats
- cloudwatch
- couchdb_changes
- dead_letter_queue
- elasticsearch
- exec
- file
- ganglia
- gelf
- generator
- github
- google_pubsub
- graphite
- heartbeat
- http
- http_poller
- imap
- irc
- jdbc
- jms
- jmx
- kafka
- kinesis
- log4j
- lumberjack
- meetup
- pipe
- puppet_facter
- rabbitmq
- redis
- relp
- rss
- s3
- salesforce
- snmp
- snmptrap
- sqlite
- sqs
- stdin
- stomp
- syslog
- tcp
- udp
- unix
- varnishlog
- websocket
- wmi
- xmpp
- Output plugins
- boundary
- circonus
- cloudwatch
- csv
- datadog
- datadog_metrics
- elastic_app_search
- elasticsearch
- exec
- file
- ganglia
- gelf
- google_bigquery
- google_pubsub
- graphite
- graphtastic
- http
- influxdb
- irc
- juggernaut
- kafka
- librato
- loggly
- lumberjack
- metriccatcher
- mongodb
- nagios
- nagios_nsca
- opentsdb
- pagerduty
- pipe
- rabbitmq
- redis
- redmine
- riak
- riemann
- s3
- sns
- solr_http
- sqs
- statsd
- stdout
- stomp
- syslog
- tcp
- timber
- udp
- webhdfs
- websocket
- xmpp
- zabbix
- Filter plugins
- aggregate
- alter
- cidr
- cipher
- clone
- csv
- date
- de_dot
- dissect
- dns
- drop
- elapsed
- elasticsearch
- environment
- extractnumbers
- fingerprint
- geoip
- grok
- http
- i18n
- jdbc_static
- jdbc_streaming
- json
- json_encode
- kv
- memcached
- metricize
- metrics
- mutate
- prune
- range
- ruby
- sleep
- split
- syslog_pri
- threats_classifier
- throttle
- tld
- translate
- truncate
- urldecode
- useragent
- uuid
- xml
- Codec plugins
- Tips and Best Practices
- Troubleshooting Common Problems
- Contributing to Logstash
- How to write a Logstash input plugin
- How to write a Logstash codec plugin
- How to write a Logstash filter plugin
- How to write a Logstash output plugin
- Documenting your plugin
- Contributing a Patch to a Logstash Plugin
- Logstash Plugins Community Maintainer Guide
- Submitting your plugin to RubyGems.org and the logstash-plugins repository
- Contributing a Java Plugin
- Glossary of Terms
- Breaking Changes
- Release Notes
- Logstash 6.7.2 Release Notes
- Logstash 6.7.1 Release Notes
- Logstash 6.7.0 Release Notes
- Logstash 6.6.2 Release Notes
- Logstash 6.6.1 Release Notes
- Logstash 6.6.0 Release Notes
- Logstash 6.5.4 Release Notes
- Logstash 6.5.3 Release Notes
- Logstash 6.5.2 Release Notes
- Logstash 6.5.1 Release Notes
- Logstash 6.5.0 Release Notes
- Logstash 6.4.3 Release Notes
- Logstash 6.4.2 Release Notes
- Logstash 6.4.1 Release Notes
- Logstash 6.4.0 Release Notes
- Logstash 6.3.2 Release Notes
- Logstash 6.3.1 Release Notes
- Logstash 6.3.0 Release Notes
- Logstash 6.2.4 Release Notes
- Logstash 6.2.3 Release Notes
- Logstash 6.2.2 Release Notes
- Logstash 6.2.1 Release Notes
- Logstash 6.2.0 Release Notes
- Logstash 6.1.3 Release Notes
- Logstash 6.1.2 Release Notes
- Logstash 6.1.1 Release Notes
- Logstash 6.1.0 Release Notes
Webhdfs output plugin
editWebhdfs output plugin
edit- Plugin version: v3.0.6
- Released on: 2018-04-06
- Changelog
For other versions, see the Versioned plugin docs.
Getting Help
editFor questions about the plugin, open a topic in the Discuss forums. For bugs or feature requests, open an issue in Github. For the list of Elastic supported plugins, please consult the Elastic Support Matrix.
Dependencies
editThis plugin has no dependency on jars from hadoop, thus reducing configuration and compatibility problems. It uses the webhdfs gem from Kazuki Ohta and TAGOMORI Satoshi (@see: https://github.com/kzk/webhdfs). Optional dependencies are zlib and snappy gem if you use the compression functionality.
Operational Notes
editIf you get an error like:
Max write retries reached. Exception: initialize: name or service not known {:level=>:error}
make sure that the hostname of your namenode is resolvable on the host running Logstash. When creating/appending
to a file, webhdfs somtime sends a 307 TEMPORARY_REDIRECT
with the HOSTNAME
of the machine its running on.
Usage
editThis is an example of Logstash config:
input { ... } filter { ... } output { webhdfs { host => "127.0.0.1" # (required) port => 50070 # (optional, default: 50070) path => "/user/logstash/dt=%{+YYYY-MM-dd}/logstash-%{+HH}.log" # (required) user => "hue" # (required) } }
Webhdfs Output Configuration Options
editThis plugin supports the following configuration options plus the Common Options described later.
Setting | Input type | Required |
---|---|---|
string, one of |
No |
|
No |
||
Yes |
||
No |
||
No |
||
No |
||
Yes |
||
No |
||
No |
||
No |
||
No |
||
No |
||
No |
||
No |
||
string, one of |
No |
|
No |
||
No |
||
No |
||
No |
||
No |
||
No |
||
No |
||
Yes |
Also see Common Options for a list of options supported by all output plugins.
compression
edit-
Value can be any of:
none
,snappy
,gzip
-
Default value is
"none"
Compress output. One of [none, snappy, gzip]
flush_size
edit- Value type is number
-
Default value is
500
Sending data to webhdfs if event count is above, even if store_interval_in_secs
is not reached.
host
edit- This is a required setting.
- Value type is string
- There is no default value for this setting.
The server name for webhdfs/httpfs connections.
idle_flush_time
edit- Value type is number
-
Default value is
1
Sending data to webhdfs in x seconds intervals.
kerberos_keytab
edit- Value type is string
- There is no default value for this setting.
Set kerberos keytab file. Note that the gssapi library needs to be available to use this.
path
edit- This is a required setting.
- Value type is string
- There is no default value for this setting.
The path to the file to write to. Event fields can be used here,
as well as date fields in the joda time format, e.g.:
/user/logstash/dt=%{+YYYY-MM-dd}/%{@source_host}-%{+HH}.log
port
edit- Value type is number
-
Default value is
50070
The server port for webhdfs/httpfs connections.
retry_interval
edit- Value type is number
-
Default value is
0.5
How long should we wait between retries.
retry_known_errors
edit- Value type is boolean
-
Default value is
true
Retry some known webhdfs errors. These may be caused by race conditions when appending to same file, etc.
retry_times
edit- Value type is number
-
Default value is
5
How many times should we retry. If retry_times is exceeded, an error will be logged and the event will be discarded.
single_file_per_thread
edit- Value type is boolean
-
Default value is
false
Avoid appending to same file in multiple threads. This solves some problems with multiple logstash output threads and locked file leases in webhdfs. If this option is set to true, %{[@metadata][thread_id]} needs to be used in path config settting.
snappy_bufsize
edit- Value type is number
-
Default value is
32768
Set snappy chunksize. Only neccessary for stream format. Defaults to 32k. Max is 65536 @see http://code.google.com/p/snappy/source/browse/trunk/framing_format.txt
snappy_format
edit-
Value can be any of:
stream
,file
-
Default value is
"stream"
Set snappy format. One of "stream", "file". Set to stream to be hive compatible.
use_httpfs
edit- Value type is boolean
-
Default value is
false
Use httpfs mode if set to true, else webhdfs.
Common Options
editThe following configuration options are supported by all output plugins:
codec
edit- Value type is codec
-
Default value is
"line"
The codec used for output data. Output codecs are a convenient method for encoding your data before it leaves the output without needing a separate filter in your Logstash pipeline.
enable_metric
edit- Value type is boolean
-
Default value is
true
Disable or enable metric logging for this specific plugin instance. By default we record all the metrics we can, but you can disable metrics collection for a specific plugin.
id
edit- Value type is string
- There is no default value for this setting.
Add a unique ID
to the plugin configuration. If no ID is specified, Logstash will generate one.
It is strongly recommended to set this ID in your configuration. This is particularly useful
when you have two or more plugins of the same type. For example, if you have 2 webhdfs outputs.
Adding a named ID in this case will help in monitoring Logstash when using the monitoring APIs.
output { webhdfs { id => "my_plugin_id" } }
On this page
- Getting Help
- Description
- Dependencies
- Operational Notes
- Usage
- Webhdfs Output Configuration Options
compression
flush_size
host
idle_flush_time
kerberos_keytab
open_timeout
path
port
read_timeout
retry_interval
retry_known_errors
retry_times
single_file_per_thread
snappy_bufsize
snappy_format
ssl_cert
ssl_key
standby_host
standby_port
use_httpfs
use_kerberos_auth
use_ssl_auth
user
- Common Options
codec
enable_metric
id