elasticsearch_java
editelasticsearch_java
editThis plugin does not ship with Logstash by default, but it is easy to install by running bin/logstash-plugin install logstash-output-elasticsearch_java
.
Also note that node
protocol usage is discouraged. Please use http
or transport
protocol.
This output lets you store logs in Elasticsearch using the native node and transport protocols. It is highly recommended to use the regular logstash-output-elasticsearch output which uses HTTP instead. This output is, in-fact, sometimes slower, and never faster than that one. Additionally, upgrading your Elasticsearch cluster may require you to simultaneously update this plugin for any protocol level changes. The HTTP client may be easier to work with due to wider familiarity with HTTP.
VERSION NOTE: Your Elasticsearch cluster must be running Elasticsearch 1.0.0 or later.
If you want to set other Elasticsearch options that are not exposed directly as configuration options, there are two methods:
-
Create an
elasticsearch.yml
file in the $PWD of the Logstash process -
Pass in es.* java properties (
java -Des.node.foo=
orruby -J-Des.node.foo=
)
With the default protocol
setting ("node"), this plugin will join your
Elasticsearch cluster as a client node, so it will show up in Elasticsearch’s
cluster status.
You can learn more about Elasticsearch at https://www.elastic.co/products/elasticsearch
Operational Notes
editIf using the default protocol
setting ("node"), your firewalls might need
to permit port 9300 in both directions (from Logstash to Elasticsearch, and
Elasticsearch to Logstash)
Retry Policy
editBy default all bulk requests to ES are synchronous. Not all events in the bulk requests always make it successfully. For example, there could be events which are not formatted correctly for the index they are targeting (type mismatch in mapping). So that we minimize loss of events, we have a specific retry policy in place. We retry all events which fail to be reached by Elasticsearch for network related issues. We retry specific events which exhibit errors under a separate policy described below. Events of this nature are ones which experience ES error codes described as retryable errors.
Retryable Errors:
- 429, Too Many Requests (RFC6585)
- 503, The server is currently unable to handle the request due to a temporary overloading or maintenance of the server.
Here are the rules of what is retried when:
- Block and retry all events in bulk response that experiences transient network exceptions until a successful submission is received by Elasticsearch.
- Retry subset of sent events which resulted in ES errors of a retryable nature which can be found in RETRYABLE_CODES
- For events which returned retryable error codes, they will be pushed onto a separate queue for retrying events. events in this queue will be retried a maximum of 5 times by default (configurable through :max_retries). The size of this queue is capped by the value set in :retry_max_items.
- Events from the retry queue are submitted again either when the queue reaches its max size or when the max interval time is reached, which is set in :retry_max_interval.
- Events which are not retryable or have reached their max retry count are logged to stderr.
Synopsis
editThis plugin supports the following configuration options:
Required configuration options:
elasticsearch_java { network_host => ... }
Available configuration options:
Setting | Input type | Required | Default value |
---|---|---|---|
No |
|
||
a valid filesystem path |
No |
||
No |
|||
No |
|
||
No |
|
||
No |
|||
No |
|||
No |
|
||
No |
|
||
No |
|
||
No |
|
||
a valid filesystem path |
No |
||
No |
|||
No |
|
||
No |
|
||
Yes |
|||
No |
|||
No |
|
||
string, one of |
No |
|
|
No |
|
||
No |
|
||
No |
|||
No |
|
||
No |
|
||
a valid filesystem path |
No |
||
No |
|
||
No |
|
||
No |
|||
a valid filesystem path |
No |
||
No |
|||
No |
|
||
No |
|
Details
edit
action
edit- Value type is string
-
Default value is
"index"
The Elasticsearch action to perform. Valid actions are: index
, delete
.
Use of this setting REQUIRES you also configure the document_id
setting
because delete
actions all require a document id.
What does each action do?
- index: indexes a document (an event from Logstash).
- delete: deletes a document by id
- create: indexes a document, fails if a document by that id already exists in the index.
- update: updates a document by id following action is not supported by HTTP protocol
- create_unless_exists: creates a document, fails if no id is provided
For more details on actions, check out the Elasticsearch bulk API documentation
cacert
edit- Value type is path
- There is no default value for this setting.
The .cer or .pem file to validate the server’s certificate
cluster
edit- Value type is string
- There is no default value for this setting.
The name of your cluster if you set it on the Elasticsearch side. Useful
for discovery when using node
or transport
protocols.
By default, it looks for a cluster named elasticsearch.
Equivalent to the Elasticsearch option cluster.name
codec
edit- Value type is codec
-
Default value is
"plain"
The codec used for output data. Output codecs are a convenient method for encoding your data before it leaves the output, without needing a separate filter in your Logstash pipeline.
doc_as_upsert
edit- Value type is boolean
-
Default value is
false
Enable doc_as_upsert for update mode create a new document with source if document_id doesn’t exists
document_id
edit- Value type is string
- There is no default value for this setting.
The document ID for the index. Useful for overwriting existing entries in Elasticsearch with the same ID.
document_type
edit- Value type is string
- There is no default value for this setting.
The document type to write events to. Generally you should try to write only
similar events to the same type. String expansion %{foo}
works here.
Unless you set document_type, the event type will be used if it exists
otherwise the document type will be assigned the value of logs
flush_size
edit- Value type is number
-
Default value is
500
This plugin uses the bulk index api for improved indexing performance. To make efficient bulk api calls, we will buffer a certain number of events before flushing that out to Elasticsearch. This setting controls how many events will be buffered before sending a batch of events.
hosts
edit- Value type is array
-
Default value is
["127.0.0.1"]
For the node
protocol, if you do not specify host
, it will attempt to use
multicast discovery to connect to Elasticsearch. If multicast is disabled in Elasticsearch,
you must include the hostname or IP address of the host(s) to use for Elasticsearch unicast discovery.
Remember the node
protocol uses the transport address (eg. 9300, not 9200).
"127.0.0.1"
["127.0.0.1:9300","127.0.0.2:9300"]
When setting hosts for node
protocol, it is important to confirm that at least one non-client
node is listed in the host
list. Also keep in mind that the host
parameter when used with
the node
protocol is for discovery purposes only (not for load balancing). When multiple hosts
are specified, it will contact the first host to see if it can use it to discover the cluster. If not,
then it will contact the second host in the list and so forth. With the node
protocol,
Logstash will join the Elasticsearch cluster as a node client (which has a copy of the cluster
state) and this node client is the one that will automatically handle the load balancing of requests
across data nodes in the cluster.
If you are looking for a high availability setup, our recommendation is to use the transport
protocol (below),
set up multiple client nodes and list the client nodes in the host
parameter.
For the transport
protocol, it will load balance requests across the hosts specified in the host
parameter.
Remember the transport
protocol uses the transport address (eg. 9300, not 9200).
"127.0.0.1"
["127.0.0.1:9300","127.0.0.2:9300"]
There is also a sniffing
option (see below) that can be used with the transport protocol to instruct it to use the host to sniff for
"alive" nodes in the cluster and automatically use it as the hosts list (but will skip the dedicated master nodes).
If you do not use the sniffing option, it is important to exclude dedicated master nodes from the host
list
to prevent Logstash from sending bulk requests to the master nodes. So this parameter should only reference either data or client nodes.
For the http
protocol, it will load balance requests across the hosts specified in the host
parameter.
Remember the http
protocol uses the http address (eg. 9200, not 9300).
"127.0.0.1"
["127.0.0.1:9200","127.0.0.2:9200"]
It is important to exclude dedicated master nodes from the host
list
to prevent LS from sending bulk requests to the master nodes. So this parameter should only reference either data or client nodes.
idle_flush_time
edit- Value type is number
-
Default value is
1
The amount of time since last flush before a flush is forced.
This setting helps ensure slow event rates don’t get stuck in Logstash.
For example, if your flush_size
is 100, and you have received 10 events,
and it has been more than idle_flush_time
seconds since the last flush,
Logstash will flush those 10 events automatically.
This helps keep both fast and slow log streams moving along in near-real-time.
index
edit- Value type is string
-
Default value is
"logstash-%{+YYYY.MM.dd}"
The index to write events to. This can be dynamic using the %{foo}
syntax.
The default value will partition your indices by day so you can more easily
delete old data or only search specific date ranges.
Indexes may not contain uppercase characters.
For weekly indexes ISO 8601 format is recommended, eg. logstash-%{+xxxx.ww}
index_type
(DEPRECATED)
edit- DEPRECATED WARNING: This configuration item is deprecated and may not be available in future versions.
- Value type is string
- There is no default value for this setting.
The index type to write events to. Generally you should try to write only
similar events to the same type. String expansion %{foo}
works here.
Deprecated in favor of document_type
field.
keystore
edit- Value type is path
- There is no default value for this setting.
The keystore used to present a certificate to the server It can be either .jks or .p12
keystore_password
edit- Value type is password
- There is no default value for this setting.
Set the truststore password
manage_template
edit- Value type is boolean
-
Default value is
true
Starting in Logstash 1.3 (unless you set option manage_template
to false)
a default mapping template for Elasticsearch will be applied, if you do not
already have one set to match the index pattern defined (default of
logstash-%{+YYYY.MM.dd}
), minus any variables. For example, in this case
the template will be applied to all indices starting with logstash-*
If you have dynamic templating (e.g. creating indices based on field names)
then you should set manage_template
to false and use the REST API to upload
your templates manually.
max_inflight_requests
(DEPRECATED)
edit- DEPRECATED WARNING: This configuration item is deprecated and may not be available in future versions.
- Value type is number
-
Default value is
50
This setting no longer does anything. It exists to keep config validation from failing. It will be removed in future versions.
network_host
edit- This is a required setting.
- Value type is string
- There is no default value for this setting.
The name/address of the host to bind to for Elasticsearch clustering. Equivalent to the Elasticsearch option network.host option. This MUST be set for either protocol to work (node or transport)! The internal Elasticsearch node will bind to this ip. This ip MUST be reachable by all nodes in the Elasticsearch cluster
node_name
edit- Value type is string
- There is no default value for this setting.
The node name Elasticsearch will use when joining a cluster.
By default, this is generated internally by the ES client.
port
edit- Value type is string
-
Default value is
"9300-9305"
The port for Elasticsearch transport to use.
If you do not set this, the following defaults are used:
* protocol => transport
- port 9300-9305
* protocol => node
- port 9300-9305
protocol
edit-
Value can be any of:
node
,transport
-
Default value is
"transport"
Choose the protocol used to talk to Elasticsearch.
The node protocol (default) will connect to the cluster as a normal Elasticsearch
node (but will not store data). If you use the node
protocol, you must permit
bidirectional communication on the port 9300 (or whichever port you have
configured).
If you do not specify the host
parameter, it will use multicast for Elasticsearch discovery. While this may work in a test/dev environment where multicast is enabled in
Elasticsearch, we strongly recommend using unicast
in Elasticsearch. To connect to an Elasticsearch cluster with multicast disabled,
you must include the host
parameter (see relevant section above).
The transport protocol will connect to the host you specify and will not show up as a node in the Elasticsearch cluster. This is useful in situations where you cannot permit connections outbound from the Elasticsearch cluster to this Logstash server.
All protocols will use bulk requests when talking to Elasticsearch.
retry_max_interval
edit- Value type is number
-
Default value is
5
Set max interval between bulk retries
retry_max_items
edit- Value type is number
-
Default value is
5000
Set retry policy for events that failed to send
routing
edit- Value type is string
- There is no default value for this setting.
A routing override to be applied to all processed events.
This can be dynamic using the %{foo}
syntax.
sniffing
edit- Value type is boolean
-
Default value is
false
Enable cluster sniffing (transport only). Asks host for the list of all cluster nodes and adds them to the hosts list Equivalent to the Elasticsearch option client.transport.sniff
ssl_certificate_verification
edit- Value type is boolean
-
Default value is
true
Validate the server’s certificate Disabling this severely compromises security For more information read https://www.cs.utexas.edu/~shmat/shmat_ccs12.pdf
template
edit- Value type is path
- There is no default value for this setting.
You can set the path to your own template here, if you so desire. If not set, the included template will be used.
template_name
edit- Value type is string
-
Default value is
"logstash"
This configuration option defines how the template is named inside Elasticsearch. Note that if you have used the template management features and subsequently change this, you will need to prune the old template manually, e.g.
curl -XDELETE <http://localhost:9200/_template/OldTemplateName?pretty>
where OldTemplateName
is whatever the former setting was.
template_overwrite
edit- Value type is boolean
-
Default value is
false
Overwrite the current template with whatever is configured
in the template
and template_name
directives.
transport_tcp_port
edit- Value type is number
- There is no default value for this setting.
This sets the local port to bind to. Equivalent to the Elasticsrearch option transport.tcp.port
truststore
edit- Value type is path
- There is no default value for this setting.
The JKS truststore to validate the server’s certificate
Use either :truststore
or :cacert
truststore_password
edit- Value type is password
- There is no default value for this setting.
Set the truststore password