Troubleshooting methods
editTroubleshooting methods
editMost common issues can be identified and resolved by following these instructions:
If you are still unable to find a solution to your problem, ask for help:
- ECK Discuss forums to ask any question
- Github issues for bugs and feature requests
View the list of resources
editTo deploy and manage the Elastic stack, ECK creates several resources in the namespace where the main resource is deployed.
For example, each Elasticsearch node and Kibana instance has a dedicated Pod. Check the status of the running Pods, and compare it with the expected instances:
kubectl get pods NAME READY STATUS RESTARTS AGE elasticsearch-sample-es-66sv6dvt7g 0/1 Pending 0 3s elasticsearch-sample-es-9xzzhmgd4h 1/1 Running 0 27m elasticsearch-sample-es-lgphkv9p67 0/1 Pending 0 3s kibana-sample-kb-5468b8685d-c7mdp 0/1 Running 0 4s
Check the services:
kubectl get services NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE elasticsearch-sample-es-http ClusterIP 10.19.248.93 <none> 9200/TCP 2d kibana-sample-kb-http ClusterIP 10.19.246.116 <none> 5601/TCP 3d
Describe failing resources
editIf an Elasticsearch node does not start up, it is probably because Kubernetes cannot schedule the associated Pod.
First, check the StatefulSets to verify if the current number of replicas match the desired number of replicas.
kubectl get statefulset NAME DESIRED CURRENT AGE elasticsearch-sample-es-default 1 1 4s
Then, check the Pod status. If a Pod fails to reach the Running
status after a few seconds, something is preventing it from being scheduled or starting up:
kubectl get pods -l elasticsearch.k8s.elastic.co/statefulset-name=elasticsearch-sample-es-default NAME READY STATUS RESTARTS AGE elasticsearch-sample-es-66sv6dvt7g 0/1 Pending 0 3s elasticsearch-sample-es-9xzzhmgd4h 1/1 Running 0 42s elasticsearch-sample-es-lgphkv9p67 0/1 Pending 0 3s kibana-sample-kb-5468b8685d-c7mdp 0/1 Running 0 4s
Pod elasticsearch-sample-es-lgphkv9p67
isn’t scheduled. Run this command to get more insights:
kubectl describe pod elasticsearch-sample-es-lgphkv9p67 (...) Events: Type Reason Age From Message ---- ------ ---- ---- ------- Warning FailedScheduling 1m (x6 over 1m) default-scheduler pod has unbound immediate PersistentVolumeClaims (repeated 2 times) Warning FailedScheduling 1m (x6 over 1m) default-scheduler pod has unbound immediate PersistentVolumeClaims Warning FailedScheduling 1m (x11 over 1m) default-scheduler 0/3 nodes are available: 1 node(s) had no available volume zone, 2 Insufficient memory. Normal NotTriggerScaleUp 4s (x11 over 1m) cluster-autoscaler pod didn't trigger scale-up (it wouldn't fit if a new node is added)
If you get an error with unbound persistent volume claims (PVCs), it means there is not currently a persistent volume that can satisfy the claim. If you are using automatically provisioned storage (for example Amazon EBS provisioner), sometimes the storage provider can take a few minutes to provision a volume, so this may resolve itself in a few minutes. You can also check the status by running kubectl describe persistentvolumeclaims
to monitor events of the PVCs.
Enable ECK debug logs
editTo enable DEBUG
level logs on the operator, edit the elastic-operator
StatefulSet and set the --log-verbosity
flag to 1
as shown in this example:
kubectl edit statefulset.apps -n elastic-system elastic-operator
change the args
array as follows:
spec: containers: - args: - manager - --log-verbosity=1
Once your change is saved, the operator is automatically restarted by the StatefulSet controller to apply the new settings.
View logs
editView Elasticsearch logs
editEach Elasticsearch node name is mapped to the corresponding Pod name. To get the logs of a particular Elasticsearch node, just fetch the Pod logs:
kubectl logs -f elasticsearch-sample-es-lgphkv9p67 (...) {"type": "server", "timestamp": "2019-07-22T08:48:10,859+0000", "level": "INFO", "component": "o.e.c.s.ClusterApplierService", "cluster.name": "elasticsearch-sample", "node.name": "elasticsearch-sample-es-lgphkv9p67", "cluster.uuid": "cX9uCx3uQrej9hMLGPhV0g", "node.id": "R_OcheBlRGeqme1IZzE4_Q", "message": "added {{elasticsearch-sample-es-kqz4jmvj9p}{UGy5IX0UQcaKlztAoh4sLA}{3o_EUuZvRKW7R1C8b1zzzg}{10.16.2.232}{10.16.2.232:9300}{ml.machine_memory=27395555328, ml.max_open_jobs=20, xpack.installed=true},{elasticsearch-sample-es-stzz78k64p}{Sh_AzQcxRzeuIoOQWgru1w}{cwPoTFNnRAWtqsXWQtWbGA}{10.16.2.233}{10.16.2.233:9300}{ml.machine_memory=27395555328, ml.max_open_jobs=20, xpack.installed=true},}, term: 1, version: 164, reason: ApplyCommitRequest{term=1, version=164, sourceNode={elasticsearch-sample-es-9xzzhmgd4h}{tAi_bCPcSaO1OkLap4wmhQ}{E6VcWWWtSB2oo-2zmj9DMQ}{10.16.1.150}{10.16.1.150:9300}{ml.machine_memory=27395555328, ml.max_open_jobs=20, xpack.installed=true}}" } {"type": "server", "timestamp": "2019-07-22T08:48:22,224+0000", "level": "INFO", "component": "o.e.c.s.ClusterApplierService", "cluster.name": "elasticsearch-sample", "node.name": "elasticsearch-sample-es-lgphkv9p67", "cluster.uuid": "cX9uCx3uQrej9hMLGPhV0g", "node.id": "R_OcheBlRGeqme1IZzE4_Q", "message": "added {{elasticsearch-sample-es-fn9wvxw6sh}{_tbAciHTStaAlUO6GtD9LA}{1g7_qsXwR0qjjfom05VwMA}{10.16.1.154}{10.16.1.154:9300}{ml.machine_memory=27395555328, ml.max_open_jobs=20, xpack.installed=true},}, term: 1, version: 169, reason: ApplyCommitRequest{term=1, version=169, sourceNode={elasticsearch-sample-es-9xzzhmgd4h}{tAi_bCPcSaO1OkLap4wmhQ}{E6VcWWWtSB2oo-2zmj9DMQ}{10.16.1.150}{10.16.1.150:9300}{ml.machine_memory=27395555328, ml.max_open_jobs=20, xpack.installed=true}}" }
You can run the same command for Kibana and APM Server.
View init container logs
editAn Elasticsearch Pod runs a few init containers to prepare the file system of the main Elasticsearch container.
In some scenarios, the Pod may fail to run (Status: Error
or Status: CrashloopBackOff
) because one of the init containers is failing to run.
Look at the init container statuses and logs to get more details.
View ECK logs
editSince the ECK operator is just a standard Pod running in the Kubernetes cluster, you can fetch its logs as you would for any other Pod:
kubectl -n elastic-system logs -f statefulset.apps/elastic-operator
The operator constantly attempts to reconcile Kubernetes resources to match the desired state.
Logs with INFO
level provide some insights about what is going on.
Logs with ERROR
level indicate something is not going as expected.
Due to optimistic locking, you can get errors reporting a conflict while updating a resource. You can ignore them, as the update goes through at the next reconciliation attempt, which will happen almost immediately.
Configure Elasticsearch timeouts
editThe operator needs to communicate with each Elasticsearch cluster in order to perform orchestration tasks. The default timeout for such requests can be configured by setting the elasticsearch-client-timeout
value as described in Configure ECK. If you have a particularly overloaded Elasticsearch cluster that is taking longer to process API requests, you can temporarily change the timeout and frequency of API calls made by the operator to that single cluster by annotating the relevant Elasticsearch
resource. The supported list of annotations are:
-
eck.k8s.elastic.co/es-client-timeout
: Request timeout for the API requests made by the Elasticsearch client. Defaults to 3 minutes. -
eck.k8s.elastic.co/es-observer-interval
: How often Elasticsearch should be checked by the operator to obtain health information. Defaults to 10 seconds.
To set the Elasticsearch client timeout to 60 seconds for a cluster named quickstart
, you can run the following command:
kubectl annotate elasticsearch quickstart eck.k8s.elastic.co/es-client-timeout=60s
Exclude resources from reconciliation
editFor debugging purposes, you might want to temporarily prevent ECK from modifying Kubernetes resources belonging to a particular Elastic Stack resource. To do this, annotate the Elastic object with eck.k8s.elastic.co/managed=false
. This annotation can be added to any of the following types of objects:
- Elasticsearch
- Kibana
- ApmServer
metadata: annotations: eck.k8s.elastic.co/managed: "false"
Or in one line:
kubectl annotate elasticsearch quickstart --overwrite eck.k8s.elastic.co/managed=false
Get Kubernetes events
editECK will emit events when:
- important operations are performed (example: a new Elasticsearch Pod was created)
- something is wrong, and the user must be notified
Fetch Kubernetes events:
kubectl get events (...) 28s 25m 58 elasticsearch-sample-es-p45nrjch29.15b3ae4cc4f7c00d Pod Warning FailedScheduling default-scheduler 0/3 nodes are available: 1 node(s) had no available volume zone, 2 Insufficient memory. 28s 25m 52 elasticsearch-sample-es-wxpnzfhqbt.15b3ae4d86bc269f Pod Warning FailedScheduling default-scheduler 0/3 nodes are available: 1 node(s) had no available volume zone, 2 Insufficient memory.
You can filter the events to show only those that are relevant to a particular Elasticsearch cluster:
kubectl get event --namespace default --field-selector involvedObject.name=elasticsearch-sample LAST SEEN FIRST SEEN COUNT NAME KIND SUBOBJECT TYPE REASON SOURCE MESSAGE 30m 30m 1 elasticsearch-sample.15b3ae303baa93c0 Elasticsearch Normal Created elasticsearch-controller Created pod elasticsearch-sample-es-4q7q2k8cl7 30m 30m 1 elasticsearch-sample.15b3ae303bab4f40 Elasticsearch Normal Created elasticsearch-controller Created pod elasticsearch-sample-es-jg7dsfkcp8 30m 30m 1 elasticsearch-sample.15b3ae303babdfc8 Elasticsearch Normal Created elasticsearch-controller Created pod elasticsearch-sample-es-xrxsp54jd5
You can set filters for Kibana and APM Server too. Note that the default TTL for events in Kubernetes is 1h, so unless your cluster settings have been modified you will not get events older than 1h.
Resizing persistent volumes
editTo increase or decrease the size of a disk, you cannot change the size of the volumeClaimTemplate directly. This is because StatefulSets do not allow modifications to volumeClaimTemplates. To work around this, you can create a new nodeSet with the new size, and remove the old nodeSet from your Elasticsearch spec. For instance, just changing the name of nodeSet and the size in the existing Elasticsearch spec. ECK will automatically begin to migrate data to the new nodeSet and remove the old one when it is fully drained. Check the StatefulSets orchestration documentation for more detail.
For a concrete example, imagine you started with this:
apiVersion: elasticsearch.k8s.elastic.co/v1 kind: Elasticsearch metadata: name: quickstart spec: version: 8.17.0 nodeSets: - name: default count: 3 config: node.store.allow_mmap: false volumeClaimTemplates: - metadata: name: elasticsearch-data spec: accessModes: - ReadWriteOnce resources: requests: storage: 5Gi storageClassName: standard
and want to increase it to 10Gi of storage. You can change the nodeSet name and the volume size like so:
apiVersion: elasticsearch.k8s.elastic.co/v1 kind: Elasticsearch metadata: name: quickstart spec: version: 8.17.0 nodeSets: - name: default-10gi count: 3 config: node.store.allow_mmap: false volumeClaimTemplates: - metadata: name: elasticsearch-data spec: accessModes: - ReadWriteOnce resources: requests: storage: 10Gi storageClassName: standard
and ECK will automatically create a new StatefulSet and begin migrating data into it.
Exec into containers
editTo troubleshoot a filesystem, configuration or a network issue, you can run Shell commands directly in the Elasticsearch container. You can do this with kubectl:
kubectl exec -ti elasticsearch-sample-es-p45nrjch29 bash
This can also be done for Kibana and APM Server.
Suspend Elasticsearch
editIn exceptional cases, you might need to suspend the Elasticsearch process while using kubectl exec
(as in the previous section) to troubleshoot.
One such example where Elasticsearch has to be stopped are the unsafe operations on Elasticsearch nodes that can be executed with the elasticsearch-node tool.
To suspend an Elasticearch node, while keeping the corresponding Pod running, you can annotate the Elasticsearch resource with the eck.k8s.elastic.co/suspend
annotation. The value should be a comma-separated list of the names of the Pods whose Elasticsearch process you want to suspend.
To suspend the second Pod in the default
node set of a cluster called quickstart
for example, you would use the following command:
kubectl annotate es quickstart eck.k8s.elastic.co/suspend=quickstart-es-default-1
You can then open a shell on the elastic-internal-suspend
init container to troubleshoot:
kubectl exec -ti quickstart-es-default-1 -c elastic-internal-suspend -- bash
Once you are done with troubleshooting the node, you can resume normal operations by removing the annotation:
kubectl annotate es quickstart eck.k8s.elastic.co/suspend-
Capture JVM heap dumps
editElasticsearch and Enterprise Search are applications that run on the JVM. It can be useful to capture a heap dump to troubleshoot garbage collection issues or suspected memory leaks or to share it with Elastic. Follow the application specific instructions for Elasticsearch and Enterprise Search.