Capturing diagnostics
editCapturing diagnostics
editThe Elasticsearch Support Diagnostic tool captures a point-in-time snapshot of cluster statistics and most settings. It works against all Elasticsearch versions.
This information can be used to troubleshoot problems with your cluster. For examples of issues that you can troubleshoot using Support Diagnostic tool output, refer to the Elastic blog.
You can generate diagnostic information using this tool before you contact Elastic Support or Elastic Discuss to minimize turnaround time.
See this this video for a walkthrough of capturing an Elasticsearch diagnostic.
Requirements
edit- Java Runtime Environment or Java Development Kit v1.8 or higher
Access the tool
editThe Support Diagnostic tool is included as a sub-library in some Elastic deployments:
- Elastic Cloud Enterprise: Located under Elastic Cloud Enterprise > Deployment > Operations > Prepare Bundle > Elasticsearch.
-
Elastic Cloud on Kubernetes: Run as
eck-diagnostics
.
You can also directly download the diagnostics-X.X.X-dist.zip
file for the latest Support Diagnostic release
from the support-diagnostic
repo.
Capture diagnostic information
editTo capture an Elasticsearch diagnostic:
-
In a terminal, verify that your network and user permissions are sufficient to connect to your Elasticsearch cluster by polling the cluster’s health.
For example, with the parameters
host:localhost
,port:9200
, andusername:elastic
, you’d use the following curl request:curl -X GET -k -u elastic -p https://localhost:9200/_cluster/health
If you receive a an HTTP 200
OK
response, then you can proceed to the next step. If you receive a different response code, then diagnose the issue before proceeding. -
Using the same environment parameters, run the diagnostic tool script.
For information about the parameters that you can pass to the tool, refer to the diagnostic parameter reference.
The following command options are recommended:
Unix-based systems
sudo ./diagnostics.sh --type local --host localhost --port 9200 -u elastic -p --bypassDiagVerify --ssl --noVerify
Windows
sudo .\diagnostics.bat --type local --host localhost --port 9200 -u elastic -p --bypassDiagVerify --ssl --noVerify
Script execution modes
You can execute the script in three modes:
-
local
(default, recommended): Polls the Elasticsearch API, gathers operating system info, and captures cluster and GC logs. -
remote
: Establishes an ssh session to the applicable target server to pull the same information aslocal
. -
api
: Polls the Elasticsearch API. All other data must be collected manually.
-
-
When the script has completed, verify that no errors were logged to
diagnostic.log
. If the log file contains errors, then refer to Diagnose errors indiagnostic.log
. -
If the script completed without errors, then an archive with the format
<diagnostic type>-diagnostics-<DateTimeStamp>.zip
is created in the working directory, or an output directory you have specified. You can review or share the diagnostic archive as needed.
Diagnose a non-200 cluster health response
editWhen you poll your cluster health, if you receive any response other than 200 0K
, then the diagnostic tool
might not work as intended. The following are possible error codes and their resolutions:
-
HTTP 401
UNAUTHENTICATED
-
Additional information in the error will usually indicate either
that your
username:password
pair is invalid, or that your.security
index is unavailable and you need to setup a temporary file-based realm user withrole:superuser
to authenticate. -
HTTP 403
UNAUTHORIZED
-
Your
username
is recognized but has insufficient permissions to run the diagnostic. Either use a different username or elevate the user’s privileges. -
HTTP 429
TOO_MANY_REQUESTS
(for example,circuit_breaking_exception
) - Your username authenticated and authorized, but the cluster is under sufficiently high strain that it’s not responding to API calls. These responses are usually intermittent. You can proceed with running the diagnostic, but the diagnostic results might be incomplete.
-
HTTP 504
BAD_GATEWAY
- Your network is experiencing issues reaching the cluster. You might be using a proxy or firewall. Consider running the diagnostic tool from a different location, confirming your port, or using an IP instead of a URL domain.
-
HTTP 503
SERVICE_UNAVAILABLE
(for example,master_not_discovered_exception
) - Your cluster does not currently have an elected master node, which is required for it to be API-responsive. This might be temporary while the master node rotates. If the issue persists, then investigate the cause before proceeding.
Diagnose errors in diagnostic.log
editThe following are common errors that you might encounter when running the diagnostic tool:
-
Error: Could not find or load main class com.elastic.support.diagnostics.DiagnosticApp
This indicates that you accidentally downloaded the source code file instead of
diagnostics-X.X.X-dist.zip
from the releases page. -
Could not retrieve the Elasticsearch version due to a system or network error - unable to continue.
This indicates that the diagnostic couldn’t run commands against the cluster. Poll the cluster’s health again, and ensure that you’re using the same parameters when you run the dianostic batch or shell file.
-
A
security_exception
that includesis unauthorized for user
:The provided user has insufficient admin permissions to run the diagnostic tool. Use another user, or grant the user
role:superuser
privileges.