Networking
editNetworking
editEach Elasticsearch node has two different network interfaces. Clients send requests to Elasticsearch’s REST APIs using its HTTP interface, but nodes communicate with other nodes using the transport interface. The transport interface is also used for communication with remote clusters.
You can configure both of these interfaces at the same time using the
network.*
settings. If you have a more complicated network, you might need to
configure the interfaces independently using the http.*
and transport.*
settings. Where possible, use the network.*
settings that apply to both
interfaces to simplify your configuration and reduce duplication.
By default Elasticsearch binds only to localhost
which means it cannot be accessed
remotely. This configuration is sufficient for a local development cluster made
of one or more nodes all running on the same host. To form a cluster across
multiple hosts, or which is accessible to remote clients, you must adjust some
network settings such as network.host
.
Be careful with the network configuration!
Never expose an unprotected node to the public internet. If you do, you are permitting anyone in the world to download, modify, or delete any of the data in your cluster.
Configuring Elasticsearch to bind to a non-local address will convert some warnings into fatal exceptions. If a node refuses to start after configuring its network settings then you must address the logged exceptions before proceeding.
Commonly used network settings
editMost users will need to configure only the following network settings.
-
network.host
-
(Static, string) Sets the address of this node for both HTTP and transport traffic. The node will bind to this address and will also use it as its publish address. Accepts an IP address, a hostname, or a special value.
Defaults to
_local_
. -
http.port
-
(Static, integer) The port to bind for HTTP client communication. Accepts a single value or a range. If a range is specified, the node will bind to the first available port in the range.
Defaults to
9200-9300
. -
transport.port
-
(Static, integer) The port to bind for communication between nodes. Accepts a single value or a range. If a range is specified, the node will bind to the first available port in the range. Set this setting to a single port, not a range, on every master-eligible node.
Defaults to
9300-9400
.
Special values for network addresses
editYou can configure Elasticsearch to automatically determine its addresses by using the
following special values. Use these values when configuring
network.host
, network.bind_host
, network.publish_host
, and the
corresponding settings for the HTTP and transport interfaces.
-
_local_
-
Any loopback addresses on the system, for example
127.0.0.1
. -
_site_
-
Any site-local addresses on the system, for example
192.168.0.1
. -
_global_
-
Any globally-scoped addresses on the system, for example
8.8.8.8
. -
_[networkInterface]_
-
Use the addresses of the network interface called
[networkInterface]
. For example if you wish to use the addresses of an interface calleden0
then setnetwork.host: _en0_
. -
0.0.0.0
- The addresses of all available network interfaces.
In some systems these special values resolve to multiple addresses. If so, Elasticsearch will select one of them as its publish address and may change its selection on each node restart. Ensure your node is accessible at every possible address.
Any values containing a :
(e.g. an IPv6 address or some of the
special values) must be quoted because :
is a
special character in YAML.
IPv4 vs IPv6
editThese special values yield both IPv4 and IPv6 addresses by default, but you can
also add an :ipv4
or :ipv6
suffix to limit them to just IPv4 or IPv6
addresses respectively. For example, network.host: "_en0:ipv4_"
would set this
node’s addresses to the IPv4 addresses of interface en0
.
Discovery in the Cloud
More special settings are available when running in the Cloud with either the EC2 discovery plugin or the Google Compute Engine discovery plugin installed.
Binding and publishing
editElasticsearch uses network addresses for two distinct purposes known as binding and publishing. Most nodes will use the same address for everything, but more complicated setups may need to configure different addresses for different purposes.
When an application such as Elasticsearch wishes to receive network communications, it must indicate to the operating system the address or addresses whose traffic it should receive. This is known as binding to those addresses. Elasticsearch can bind to more than one address if needed, but most nodes only bind to a single address. Elasticsearch can only bind to an address if it is running on a host that has a network interface with that address. If necessary, you can configure the transport and HTTP interfaces to bind to different addresses.
Each Elasticsearch node has an address at which clients and other nodes can contact it, known as its publish address. Each node has one publish address for its HTTP interface and one for its transport interface. These two addresses can be anything, and don’t need to be addresses of the network interfaces on the host. The only requirements are that each node must be:
- Accessible at its transport publish address by all other nodes in its cluster, and by any remote clusters that will discover it using Sniff mode.
- Accessible at its HTTP publish address by all clients that will discover it using sniffing.
Using a single address
editThe most common configuration is for Elasticsearch to bind to a single address at which
it is accessible to clients and other nodes. In this configuration you should
just set network.host
to that address. You should not separately set any bind
or publish addresses, nor should you separately configure the addresses for the
HTTP or transport interfaces.
Using multiple addresses
editUse the advanced network settings if you wish to
bind Elasticsearch to multiple addresses, or to publish a different address from the
addresses to which you are binding. Set network.bind_host
to the bind
addresses, and network.publish_host
to the address at which this node is
exposed. In complex configurations, you can configure these addresses
differently for the HTTP and transport interfaces.
Advanced network settings
editThese advanced settings let you bind to multiple addresses, or to use different addresses for binding and publishing. They are not required in most cases and you should not use them if you can use the commonly used settings instead.
-
network.bind_host
-
(Static, string)
The network address(es) to which the node should bind in order to listen for
incoming connections. Accepts a list of IP addresses, hostnames, and
special values. Defaults to the address given by
network.host
. Use this setting only if binding to multiple addresses or using different addresses for publishing and binding. -
network.publish_host
-
(Static, string)
The network address that clients and other nodes can use to contact this node.
Accepts an IP address, a hostname, or a special
value. Defaults to the address given by
network.host
. Use this setting only if binding to multiple addresses or using different addresses for publishing and binding.
You can specify a list of addresses for network.host
and
network.publish_host
. You can also specify one or more hostnames or
special values that resolve to multiple addresses.
If you do this then Elasticsearch chooses one of the addresses for its publish address.
This choice uses heuristics based on IPv4/IPv6 stack preference and
reachability and may change when the node restarts. Ensure
each node is accessible at all possible publish addresses.
Advanced TCP settings
editUse the following settings to control the low-level parameters of the TCP connections used by the HTTP and transport interfaces.
-
network.tcp.keep_alive
-
(Static, boolean)
Configures the
SO_KEEPALIVE
option for network sockets, which determines whether each connection sends TCP keepalive probes. Defaults totrue
. -
network.tcp.keep_idle
-
(Static, integer)
Configures the
TCP_KEEPIDLE
option for network sockets, which determines the time in seconds that a connection must be idle before starting to send TCP keepalive probes. Defaults to-1
, which means to use the system default. This value cannot exceed300
seconds. Only applicable on Linux and macOS. -
network.tcp.keep_interval
-
(Static, integer)
Configures the
TCP_KEEPINTVL
option for network sockets, which determines the time in seconds between sending TCP keepalive probes. Defaults to-1
, which means to use the system default. This value cannot exceed300
seconds. Only applicable on Linux and macOS. -
network.tcp.keep_count
-
(Static, integer)
Configures the
TCP_KEEPCNT
option for network sockets, which determines the number of unacknowledged TCP keepalive probes that may be sent on a connection before it is dropped. Defaults to-1
, which means to use the system default. Only applicable on Linux and macOS. -
network.tcp.no_delay
-
(Static, boolean)
Configures the
TCP_NODELAY
option on network sockets, which determines whether TCP no delay is enabled. Defaults totrue
. -
network.tcp.reuse_address
-
(Static, boolean)
Configures the
SO_REUSEADDR
option for network sockets, which determines whether the address can be reused or not. Defaults tofalse
on Windows andtrue
otherwise. -
network.tcp.send_buffer_size
-
(Static, byte value)
Configures the size of the TCP send buffer for network sockets. Defaults to
-1
which means to use the system default. -
network.tcp.receive_buffer_size
-
(Static, byte value)
Configures the size of the TCP receive buffer. Defaults to
-1
which means to use the system default.
Advanced HTTP settings
editUse the following advanced settings to configure the HTTP interface independently of the transport interface. You can also configure both interfaces together using the network settings.
-
http.host
-
(Static, string) Sets the address of this node for HTTP traffic. The node will bind to this address and will also use it as its HTTP publish address. Accepts an IP address, a hostname, or a special value. Use this setting only if you require different configurations for the transport and HTTP interfaces.
Defaults to the address given by
network.host
. -
http.bind_host
-
(Static, string)
The network address(es) to which the node should bind in order to listen for
incoming HTTP connections. Accepts a list of IP addresses, hostnames, and
special values. Defaults to the address given by
http.host
ornetwork.bind_host
. Use this setting only if you require to bind to multiple addresses or to use different addresses for publishing and binding, and you also require different binding configurations for the transport and HTTP interfaces. -
http.publish_host
-
(Static, string)
The network address for HTTP clients to contact the node using sniffing.
Accepts an IP address, a hostname, or a special
value. Defaults to the address given by
http.host
ornetwork.publish_host
. Use this setting only if you require to bind to multiple addresses or to use different addresses for publishing and binding, and you also require different binding configurations for the transport and HTTP interfaces. -
http.publish_port
-
(Static, integer)
The port of the HTTP publish address.
Configure this setting only if you need the publish port to be different from
http.port
. Defaults to the port assigned viahttp.port
. -
http.max_content_length
-
(Static, byte value)
Maximum size of an HTTP request body. Defaults to
100mb
. -
http.max_initial_line_length
-
(Static, byte value)
Maximum size of an HTTP URL. Defaults to
4kb
. -
http.max_header_size
-
(Static, byte value)
Maximum size of allowed headers. Defaults to
16kb
.
-
http.compression
-
(Static, boolean) Support for compression when possible (with Accept-Encoding). If HTTPS is enabled, defaults to
false
. Otherwise, defaults totrue
.Disabling compression for HTTPS mitigates potential security risks, such as a BREACH attack. To compress HTTPS traffic, you must explicitly set
http.compression
totrue
. -
http.compression_level
-
(Static, integer)
Defines the compression level to use for HTTP responses. Valid values are in the range of 1 (minimum compression) and 9 (maximum compression). Defaults to
3
.
-
http.cors.enabled
-
(Static, boolean) Enable or disable cross-origin resource sharing, which determines whether a browser on another origin can execute requests against Elasticsearch. Set to
true
to enable Elasticsearch to process pre-flight CORS requests. Elasticsearch will respond to those requests with theAccess-Control-Allow-Origin
header if theOrigin
sent in the request is permitted by thehttp.cors.allow-origin
list. Set tofalse
(the default) to make Elasticsearch ignore theOrigin
request header, effectively disabling CORS requests because Elasticsearch will never respond with theAccess-Control-Allow-Origin
response header.If the client does not send a pre-flight request with an
Origin
header or it does not check the response headers from the server to validate theAccess-Control-Allow-Origin
response header, then cross-origin security is compromised. If CORS is not enabled on Elasticsearch, the only way for the client to know is to send a pre-flight request and realize the required response headers are missing.
-
http.cors.allow-origin
-
(Static, string) Which origins to allow. If you prepend and append a forward slash (
/
) to the value, this will be treated as a regular expression, allowing you to support HTTP and HTTPs. For example, using/https?:\/\/localhost(:[0-9]+)?/
would return the request header appropriately in both cases. Defaults to no origins allowed.A wildcard (
*
) is a valid value but is considered a security risk, as your Elasticsearch instance is open to cross origin requests from anywhere.
-
http.cors.max-age
-
(Static, integer)
Browsers send a "preflight" OPTIONS-request to determine CORS settings.
max-age
defines for how long, in seconds, the result should be cached. Defaults to1728000
(20 days).
-
http.cors.allow-methods
-
(Static, string)
Which methods to allow. Defaults to
OPTIONS, HEAD, GET, POST, PUT, DELETE
.
-
http.cors.allow-headers
-
(Static, string)
Which headers to allow. Defaults to
X-Requested-With, Content-Type, Content-Length
.
-
http.cors.allow-credentials
-
(Static, boolean) Whether the
Access-Control-Allow-Credentials
header should be returned. Defaults tofalse
.This header is only returned when the setting is set to
true
.
-
http.detailed_errors.enabled
-
(Static, boolean)
Configures whether detailed error reporting in HTTP responses is enabled.
Defaults to
true
, which means that HTTP requests that include the?error_trace
parameter will return a detailed error message including a stack trace if they encounter an exception. If set tofalse
, requests with the?error_trace
parameter are rejected. -
http.pipelining.max_events
-
(Static, integer)
The maximum number of events to be queued up in memory before an HTTP connection is closed, defaults to
10000
. -
http.max_warning_header_count
-
(Static, integer)
The maximum number of warning headers in client HTTP responses. Defaults to
-1
which means the number of warning headers is unlimited. -
http.max_warning_header_size
-
(Static, byte value)
The maximum total size of warning headers in client HTTP responses. Defaults to
-1
which means the size of the warning headers is unlimited. -
http.tcp.keep_alive
-
(Static, boolean)
Configures the
SO_KEEPALIVE
option for this socket, which determines whether it sends TCP keepalive probes. Defaults tonetwork.tcp.keep_alive
. -
http.tcp.keep_idle
-
(Static, integer)
Configures the
TCP_KEEPIDLE
option for HTTP sockets, which determines the time in seconds that a connection must be idle before starting to send TCP keepalive probes. Defaults tonetwork.tcp.keep_idle
, which uses the system default. This value cannot exceed300
seconds. Only applicable on Linux and macOS. -
http.tcp.keep_interval
-
(Static, integer)
Configures the
TCP_KEEPINTVL
option for HTTP sockets, which determines the time in seconds between sending TCP keepalive probes. Defaults tonetwork.tcp.keep_interval
, which uses the system default. This value cannot exceed300
seconds. Only applicable on Linux and macOS. -
http.tcp.keep_count
-
(Static, integer)
Configures the
TCP_KEEPCNT
option for HTTP sockets, which determines the number of unacknowledged TCP keepalive probes that may be sent on a connection before it is dropped. Defaults tonetwork.tcp.keep_count
, which uses the system default. Only applicable on Linux and macOS. -
http.tcp.no_delay
-
(Static, boolean)
Configures the
TCP_NODELAY
option on HTTP sockets, which determines whether TCP no delay is enabled. Defaults totrue
. -
http.tcp.reuse_address
-
(Static, boolean)
Configures the
SO_REUSEADDR
option for HTTP sockets, which determines whether the address can be reused or not. Defaults tofalse
on Windows andtrue
otherwise. -
http.tcp.send_buffer_size
-
(Static, byte value)
The size of the TCP send buffer for HTTP traffic. Defaults to
network.tcp.send_buffer_size
. -
http.tcp.receive_buffer_size
-
(Static, byte value)
The size of the TCP receive buffer for HTTP traffic. Defaults to
network.tcp.receive_buffer_size
. -
http.client_stats.enabled
-
(Dynamic, boolean)
Enable or disable collection of HTTP client stats. Defaults to
true
. -
http.client_stats.closed_channels.max_count
-
(Static, integer)
When
http.client_stats.enabled
istrue
, sets the maximum number of closed HTTP channels for which Elasticsearch reports statistics. Defaults to10000
. -
http.client_stats.closed_channels.max_age
-
(Static, time value)
When
http.client_stats.enabled
istrue
, sets the maximum length of time after closing a HTTP channel that Elasticsearch will report that channel’s statistics. Defaults to5m
.
Advanced transport settings
editUse the following advanced settings to configure the transport interface independently of the HTTP interface. Use the network settings to configure both interfaces together.
-
transport.host
-
(Static, string) Sets the address of this node for transport traffic. The node will bind to this address and will also use it as its transport publish address. Accepts an IP address, a hostname, or a special value. Use this setting only if you require different configurations for the transport and HTTP interfaces.
Defaults to the address given by
network.host
. -
transport.bind_host
-
(Static, string)
The network address(es) to which the node should bind in order to listen for
incoming transport connections. Accepts a list of IP addresses, hostnames, and
special values. Defaults to the address given by
transport.host
ornetwork.bind_host
. Use this setting only if you require to bind to multiple addresses or to use different addresses for publishing and binding, and you also require different binding configurations for the transport and HTTP interfaces. -
transport.publish_host
-
(Static, string)
The network address at which the node can be contacted by other nodes. Accepts
an IP address, a hostname, or a special value.
Defaults to the address given by
transport.host
ornetwork.publish_host
. Use this setting only if you require to bind to multiple addresses or to use different addresses for publishing and binding, and you also require different binding configurations for the transport and HTTP interfaces. -
transport.publish_port
-
(Static, integer)
The port of the transport publish
address. Set this parameter only if you need the publish port to be
different from
transport.port
. Defaults to the port assigned viatransport.port
. -
transport.connect_timeout
-
(Static, time value)
The connect timeout for initiating a new connection (in
time setting format). Defaults to
30s
. -
transport.compress
-
(Static, string)
Set to
true
,indexing_data
, orfalse
to configure transport compression between nodes. The optiontrue
will compress all data. The optionindexing_data
will compress only the raw index data sent between nodes during ingest, ccr following (excluding bootstrap), and operations based shard recovery (excluding transferring lucene files). Defaults toindexing_data
. -
transport.compression_scheme
-
(Static, string)
Configures the compression scheme for
transport.compress
. The options aredeflate
orlz4
. Iflz4
is configured and the remote node has not been upgraded to a version supportinglz4
, the traffic will be sent uncompressed. Defaults tolz4
. -
transport.tcp.keep_alive
-
(Static, boolean)
Configures the
SO_KEEPALIVE
option for transport sockets, which determines whether they send TCP keepalive probes. Defaults tonetwork.tcp.keep_alive
. -
transport.tcp.keep_idle
-
(Static, integer)
Configures the
TCP_KEEPIDLE
option for transport sockets, which determines the time in seconds that a connection must be idle before starting to send TCP keepalive probes. Defaults tonetwork.tcp.keep_idle
if set, or the system default otherwise. This value cannot exceed300
seconds. In cases where the system default is higher than300
, the value is automatically lowered to300
. Only applicable on Linux and macOS. -
transport.tcp.keep_interval
-
(Static, integer)
Configures the
TCP_KEEPINTVL
option for transport sockets, which determines the time in seconds between sending TCP keepalive probes. Defaults tonetwork.tcp.keep_interval
if set, or the system default otherwise. This value cannot exceed300
seconds. In cases where the system default is higher than300
, the value is automatically lowered to300
. Only applicable on Linux and macOS. -
transport.tcp.keep_count
-
(Static, integer)
Configures the
TCP_KEEPCNT
option for transport sockets, which determines the number of unacknowledged TCP keepalive probes that may be sent on a connection before it is dropped. Defaults tonetwork.tcp.keep_count
if set, or the system default otherwise. Only applicable on Linux and macOS. -
transport.tcp.no_delay
-
(Static, boolean)
Configures the
TCP_NODELAY
option on transport sockets, which determines whether TCP no delay is enabled. Defaults totrue
. -
transport.tcp.reuse_address
-
(Static, boolean)
Configures the
SO_REUSEADDR
option for network sockets, which determines whether the address can be reused or not. Defaults tonetwork.tcp.reuse_address
. -
transport.tcp.send_buffer_size
-
(Static, byte value)
The size of the TCP send buffer for transport traffic. Defaults to
network.tcp.send_buffer_size
. -
transport.tcp.receive_buffer_size
-
(Static, byte value)
The size of the TCP receive buffer for transport traffic. Defaults to
network.tcp.receive_buffer_size
. -
transport.ping_schedule
-
(Static, time value)
Configures the time between sending application-level pings on all transport
connections to promptly detect when a transport connection has failed. Defaults
to
-1
meaning that application-level pings are not sent. You should use TCP keepalives (seetransport.tcp.keep_alive
) instead of application-level pings wherever possible.
Transport profiles
editElasticsearch allows you to bind to multiple ports on different interfaces by the use of transport profiles. See this example configuration
transport.profiles.default.port: 9300-9400 transport.profiles.default.bind_host: 10.0.0.1 transport.profiles.client.port: 9500-9600 transport.profiles.client.bind_host: 192.168.0.1 transport.profiles.dmz.port: 9700-9800 transport.profiles.dmz.bind_host: 172.16.1.2
The default
profile is special. It is used as a fallback for any other
profiles, if those do not have a specific configuration setting set, and is how
this node connects to other nodes in the cluster.
Other profiles can have any name and can be used to set up specific endpoints
for incoming connections.
The following parameters can be configured on each transport profile, as in the example above:
-
port
: The port to which to bind. -
bind_host
: The host to which to bind. -
publish_host
: The host which is published in informational APIs.
Profiles also support all the other transport settings specified in the
transport settings section, and use these as defaults.
For example, transport.profiles.client.tcp.reuse_address
can be explicitly
configured, and defaults otherwise to transport.tcp.reuse_address
.
Long-lived idle connections
editA transport connection between two nodes is made up of a number of long-lived
TCP connections, some of which may be idle for an extended period of time.
Nonetheless, Elasticsearch requires these connections to remain open, and it
can disrupt the operation of your cluster if any inter-node connections are
closed by an external influence such as a firewall. It is important to
configure your network to preserve long-lived idle connections between
Elasticsearch nodes, for instance by leaving *.tcp.keep_alive
enabled and
ensuring that the keepalive interval is shorter than any timeout that might
cause idle connections to be closed, or by setting transport.ping_schedule
if
keepalives cannot be configured. Devices which drop connections when they reach
a certain age are a common source of problems to Elasticsearch clusters, and
must not be used.
Request compression
editThe default transport.compress
configuration option indexing_data
will only
compress requests that relate to the transport of raw indexing source data
between nodes. This option primarily compresses data sent during ingest,
ccr, and shard recovery. This default normally makes sense for local cluster
communication as compressing raw documents tends significantly reduce inter-node
network usage with minimal CPU impact.
The transport.compress
setting always configures local cluster request
compression and is the fallback setting for remote cluster request compression.
If you want to configure remote request compression differently than local
request compression, you can set it on a per-remote cluster basis using the
cluster.remote.${cluster_alias}.transport.compress
setting.
Response compression
editThe compression settings do not configure compression for responses. Elasticsearch will compress a response if the inbound request was compressed—even when compression is not enabled. Similarly, Elasticsearch will not compress a response if the inbound request was uncompressed—even when compression is enabled. The compression scheme used to compress a response will be the same scheme the remote node used to compress the request.
Request tracing
editYou can trace individual requests made on the HTTP and transport layers.
Tracing can generate extremely high log volumes that can destabilize your cluster. Do not enable request tracing on busy or important clusters.
REST request tracer
editThe HTTP layer has a dedicated tracer that logs incoming requests and the
corresponding outgoing responses. Activate the tracer by setting the level of
the org.elasticsearch.http.HttpTracer
logger to TRACE
:
PUT _cluster/settings { "persistent" : { "logger.org.elasticsearch.http.HttpTracer" : "TRACE" } }
You can also control which URIs will be traced, using a set of include and exclude wildcard patterns. By default every request will be traced.
PUT _cluster/settings { "persistent" : { "http.tracer.include" : "*", "http.tracer.exclude" : "" } }
Transport tracer
editThe transport layer has a dedicated tracer that logs incoming and outgoing
requests and responses. Activate the tracer by setting the level of the
org.elasticsearch.transport.TransportService.tracer
logger to TRACE
:
PUT _cluster/settings { "persistent" : { "logger.org.elasticsearch.transport.TransportService.tracer" : "TRACE" } }
You can also control which actions will be traced, using a set of include and exclude wildcard patterns. By default every request will be traced except for fault detection pings:
PUT _cluster/settings { "persistent" : { "transport.tracer.include" : "*", "transport.tracer.exclude" : "internal:coordination/fault_detection/*" } }
Networking threading model
editThis section describes the threading model used by the networking subsystem in Elasticsearch. This information isn’t required to use Elasticsearch, but it may be useful to advanced users who are diagnosing network problems in a cluster.
Elasticsearch nodes communicate over a collection of TCP channels that together form a
transport connection. Elasticsearch clients communicate with the cluster over HTTP,
which also uses one or more TCP channels. Each of these TCP channels is owned
by exactly one of the transport_worker
threads in the node. This owning
thread is chosen when the channel is opened and remains the same for the
lifetime of the channel.
Each transport_worker
thread has sole responsibility for sending and
receiving data over the channels it owns. One of the transport_worker
threads
is also responsible for accepting new incoming transport connections, and one
is responsible for accepting new HTTP connections.
If a thread in Elasticsearch wants to send data over a particular channel, it passes the
data to the owning transport_worker
thread for the actual transmission.
Normally the transport_worker
threads will not completely handle the messages
they receive. Instead, they will do a small amount of preliminary processing
and then dispatch (hand off) the message to a different
threadpool for the rest of their handling. For instance,
bulk messages are dispatched to the write
threadpool, searches are dispatched
to one of the search
threadpools, and requests for statistics and other
management tasks are mostly dispatched to the management
threadpool. However
in some cases the processing of a message is expected to be so quick that Elasticsearch
will do all of the processing on the transport_worker
thread rather than
incur the overhead of dispatching it elsewhere.
By default, there is one transport_worker
thread per CPU. In contrast, there
may sometimes be tens-of-thousands of TCP channels. If data arrives on a TCP
channel and its owning transport_worker
thread is busy, the data isn’t
processed until the thread finishes whatever it is doing. Similarly, outgoing
data are not sent over a channel until the owning transport_worker
thread is
free. This means that we require every transport_worker
thread to be idle
frequently. An idle transport_worker
looks something like this in a stack
dump:
"elasticsearch[instance-0000000004][transport_worker][T#1]" #32 daemon prio=5 os_prio=0 cpu=9645.94ms elapsed=501.63s tid=0x00007fb83b6307f0 nid=0x1c4 runnable [0x00007fb7b8ffe000] java.lang.Thread.State: RUNNABLE at sun.nio.ch.EPoll.wait(java.base@17.0.2/Native Method) at sun.nio.ch.EPollSelectorImpl.doSelect(java.base@17.0.2/EPollSelectorImpl.java:118) at sun.nio.ch.SelectorImpl.lockAndDoSelect(java.base@17.0.2/SelectorImpl.java:129) - locked <0x00000000c443c518> (a sun.nio.ch.Util$2) - locked <0x00000000c38f7700> (a sun.nio.ch.EPollSelectorImpl) at sun.nio.ch.SelectorImpl.select(java.base@17.0.2/SelectorImpl.java:146) at io.netty.channel.nio.NioEventLoop.select(NioEventLoop.java:813) at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:460) at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:986) at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) at java.lang.Thread.run(java.base@17.0.2/Thread.java:833)
In the Nodes hot threads API an idle transport_worker
thread is
reported like this:
100.0% [cpu=0.0%, other=100.0%] (500ms out of 500ms) cpu usage by thread 'elasticsearch[instance-0000000004][transport_worker][T#1]' 10/10 snapshots sharing following 9 elements java.base@17.0.2/sun.nio.ch.EPoll.wait(Native Method) java.base@17.0.2/sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:118) java.base@17.0.2/sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:129) java.base@17.0.2/sun.nio.ch.SelectorImpl.select(SelectorImpl.java:146) io.netty.channel.nio.NioEventLoop.select(NioEventLoop.java:813) io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:460) io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:986) io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) java.base@17.0.2/java.lang.Thread.run(Thread.java:833)
Note that transport_worker
threads should always be in state RUNNABLE
, even
when waiting for input, because they block in the native EPoll#wait
method.
This means the hot threads API will report these threads at 100% overall
utilisation. This is normal, and the breakdown of time into cpu=
and other=
fractions shows how much time the thread spent running and waiting for input
respectively.
If a transport_worker
thread is not frequently idle, it may build up a
backlog of work. This can cause delays in processing messages on the channels
that it owns. It’s hard to predict exactly which work will be delayed:
- There are many more channels than threads. If work related to one channel is causing delays to its worker thread, all other channels owned by that thread will also suffer delays.
- The mapping from TCP channels to worker threads is fixed but arbitrary. Each channel is assigned an owning thread in a round-robin fashion when the channel is opened. Each worker thread is responsible for many different kinds of channel.
- There are many channels open between each pair of nodes. For each request, Elasticsearch will choose from the appropriate channels in a round-robin fashion. Some requests may end up on a channel owned by a delayed worker while other identical requests will be sent on a channel that’s working smoothly.
If the backlog builds up too far, some messages may be delayed by many seconds.
The node might even fail its health checks and be
removed from the cluster. Sometimes, you can find evidence of busy
transport_worker
threads using the Nodes hot threads API.
However, this API itself sends network messages so may not work correctly if
the transport_worker
threads are too busy. It is more reliable to use
jstack
to obtain stack dumps or use Java Flight Recorder to obtain a
profiling trace. These tools are independent of any work the JVM is performing.