Troubleshooting
editTroubleshooting
editWhen things don’t work as expected, you can investigate by taking the following actions:
Get the list of resources
editTo deploy and manage the Elastic stack, ECK creates several resources in the namespace where the main resource is deployed.
For example, each Elasticsearch node and Kibana instance has a dedicated Pod. Check the status of the running Pods, and compare it with the expected instances:
kubectl get pods NAME READY STATUS RESTARTS AGE elasticsearch-sample-es-66sv6dvt7g 0/1 Pending 0 3s elasticsearch-sample-es-9xzzhmgd4h 1/1 Running 0 27m elasticsearch-sample-es-lgphkv9p67 0/1 Pending 0 3s kibana-sample-kb-5468b8685d-c7mdp 0/1 Running 0 4s
Check the services:
kubectl get services NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE elasticsearch-sample-es-http ClusterIP 10.19.248.93 <none> 9200/TCP 2d kibana-sample-kb-http ClusterIP 10.19.246.116 <none> 5601/TCP 3d
Describe failing resources
editIf an Elasticsearch node does not come up, it’s probably because Kubernetes cannot schedule the associated Pod. Check the Pod status: if it’s not Running
after a few seconds, something is wrong:
kubectl get pods NAME READY STATUS RESTARTS AGE elasticsearch-sample-es-66sv6dvt7g 0/1 Pending 0 3s elasticsearch-sample-es-9xzzhmgd4h 1/1 Running 0 27m elasticsearch-sample-es-lgphkv9p67 0/1 Pending 0 3s kibana-sample-kb-5468b8685d-c7mdp 0/1 Running 0 4s
Pod elasticsearch-sample-es-lgphkv9p67
isn’t scheduled. Run this command to get more insights:
kubectl describe pod elasticsearch-sample-es-lgphkv9p67 (...) Events: Type Reason Age From Message ---- ------ ---- ---- ------- Warning FailedScheduling 1m (x6 over 1m) default-scheduler pod has unbound immediate PersistentVolumeClaims (repeated 2 times) Warning FailedScheduling 1m (x6 over 1m) default-scheduler pod has unbound immediate PersistentVolumeClaims Warning FailedScheduling 1m (x11 over 1m) default-scheduler 0/3 nodes are available: 1 node(s) had no available volume zone, 2 Insufficient memory. Normal NotTriggerScaleUp 4s (x11 over 1m) cluster-autoscaler pod didn't trigger scale-up (it wouldn't fit if a new node is added)
If you see an error with unbound persistent volume claims (PVCs), it means there is not currently a persistent volume that can satisfy the claim. If you are using automatically provisioned storage (e.g. Amazon EBS provisioner), sometimes the storage provider can take a few minutes to provision a volume, so this may resolve itself in a few minutes. You can also check the status by running kubectl describe persistentvolumeclaims` to see events of the PVCs.
Get Elasticsearch logs
editEach Elasticsearch node name is mapped to the corresponding Pod name. To get the logs of a particular node, just fetch the Pod logs:
kubectl logs -f elasticsearch-sample-es-lgphkv9p67 (...) {"type": "server", "timestamp": "2019-07-22T08:48:10,859+0000", "level": "INFO", "component": "o.e.c.s.ClusterApplierService", "cluster.name": "elasticsearch-sample", "node.name": "elasticsearch-sample-es-lgphkv9p67", "cluster.uuid": "cX9uCx3uQrej9hMLGPhV0g", "node.id": "R_OcheBlRGeqme1IZzE4_Q", "message": "added {{elasticsearch-sample-es-kqz4jmvj9p}{UGy5IX0UQcaKlztAoh4sLA}{3o_EUuZvRKW7R1C8b1zzzg}{10.16.2.232}{10.16.2.232:9300}{ml.machine_memory=27395555328, ml.max_open_jobs=20, xpack.installed=true},{elasticsearch-sample-es-stzz78k64p}{Sh_AzQcxRzeuIoOQWgru1w}{cwPoTFNnRAWtqsXWQtWbGA}{10.16.2.233}{10.16.2.233:9300}{ml.machine_memory=27395555328, ml.max_open_jobs=20, xpack.installed=true},}, term: 1, version: 164, reason: ApplyCommitRequest{term=1, version=164, sourceNode={elasticsearch-sample-es-9xzzhmgd4h}{tAi_bCPcSaO1OkLap4wmhQ}{E6VcWWWtSB2oo-2zmj9DMQ}{10.16.1.150}{10.16.1.150:9300}{ml.machine_memory=27395555328, ml.max_open_jobs=20, xpack.installed=true}}" } {"type": "server", "timestamp": "2019-07-22T08:48:22,224+0000", "level": "INFO", "component": "o.e.c.s.ClusterApplierService", "cluster.name": "elasticsearch-sample", "node.name": "elasticsearch-sample-es-lgphkv9p67", "cluster.uuid": "cX9uCx3uQrej9hMLGPhV0g", "node.id": "R_OcheBlRGeqme1IZzE4_Q", "message": "added {{elasticsearch-sample-es-fn9wvxw6sh}{_tbAciHTStaAlUO6GtD9LA}{1g7_qsXwR0qjjfom05VwMA}{10.16.1.154}{10.16.1.154:9300}{ml.machine_memory=27395555328, ml.max_open_jobs=20, xpack.installed=true},}, term: 1, version: 169, reason: ApplyCommitRequest{term=1, version=169, sourceNode={elasticsearch-sample-es-9xzzhmgd4h}{tAi_bCPcSaO1OkLap4wmhQ}{E6VcWWWtSB2oo-2zmj9DMQ}{10.16.1.150}{10.16.1.150:9300}{ml.machine_memory=27395555328, ml.max_open_jobs=20, xpack.installed=true}}" }
You can run the same command for Kibana and APM Server.
Get init container logs
editAn Elasticsearch Pod runs a few init containers to prepare the file system of the main Elasticsearch container.
In some scenarios, the Pod may fail to run (Status: Error
or Status: CrashloopBackOff
) because one of the init containers is failing to run.
Look at the init container statuses and logs to get more details.
Get ECK logs
editSince the ECK operator is just a standard Pod running in the Kubernetes cluster, you can fetch its logs as any other Pod:
kubectl -n elastic-system logs -f statefulset.apps/elastic-operator
The operator constantly attempts to reconcile Kubernetes resources to match the desired state.
Logs with INFO
level provide some insights about what is going on.
Logs with ERROR level indicate something is not going as expected.
Due to optimistic locking, you can get errors reporting a conflict while updating a resource. You can ignore them, as the update goes through at the next reconciliation attempt, almost immediately after.
Enable ECK debug logs
editTo enable DEBUG
level logs on the operator, restart it with the flag --enable-debug-logs=true
. For example:
kubectl edit statefulset.apps -n elastic-system elastic-operator
and change the following lines from:
spec: containers: - args: - manager - --operator-roles - all - --enable-debug-logs=false
to
spec: containers: - args: - manager - --operator-roles - all - --enable-debug-logs=true
Pause ECK controllers
editWhen debugging Elasticsearch, you night need to "pause" the operator reconciliations, so that no resource gets modified or created in the meantime.
To do this, set the annotation common.k8s.elastic.co/pause
to true
to any resource controlled by the operator:
- Elasticsearch
- Kibana
- ApmServer
metadata: annotations: common.k8s.elastic.co/pause: "true"
Or in one line:
kubectl annotate elasticsearch quickstart --overwrite common.k8s.elastic.co/pause=true
Get Kubernetes events
editECK will emit events when:
- important operations are performed (example: a new Elasticsearch Pod was created)
- something is wrong, and the user must be notified
Fetch Kubernetes events:
kubectl get events (...) 28s 25m 58 elasticsearch-sample-es-p45nrjch29.15b3ae4cc4f7c00d Pod Warning FailedScheduling default-scheduler 0/3 nodes are available: 1 node(s) had no available volume zone, 2 Insufficient memory. 28s 25m 52 elasticsearch-sample-es-wxpnzfhqbt.15b3ae4d86bc269f Pod Warning FailedScheduling default-scheduler 0/3 nodes are available: 1 node(s) had no available volume zone, 2 Insufficient memory.
You can filter the events to show only those that are relevant to a particular Elasticsearch cluster:
kubectl get event --namespace default --field-selector involvedObject.name=elasticsearch-sample LAST SEEN FIRST SEEN COUNT NAME KIND SUBOBJECT TYPE REASON SOURCE MESSAGE 30m 30m 1 elasticsearch-sample.15b3ae303baa93c0 Elasticsearch Normal Created elasticsearch-controller Created pod elasticsearch-sample-es-4q7q2k8cl7 30m 30m 1 elasticsearch-sample.15b3ae303bab4f40 Elasticsearch Normal Created elasticsearch-controller Created pod elasticsearch-sample-es-jg7dsfkcp8 30m 30m 1 elasticsearch-sample.15b3ae303babdfc8 Elasticsearch Normal Created elasticsearch-controller Created pod elasticsearch-sample-es-xrxsp54jd5
You can set filters for Kibana and APM Server too. Note that the default TTL for events in Kubernetes is 1h, so unless your cluster settings have been modified you will not see events older than 1h.
Exec into containers
editTo troubleshoot a filesystem, configuration or a network issue, you can run Shell commands directly in the Elasticsearch container. You can do this with kubectl:
kubectl exec -ti elasticsearch-sample-es-p45nrjch29 bash
This can also be done for Kibana and APM Server.
Webhook troubleshooting
editOn startup, the operator deploys an admission webhook that points to the operator’s service. If this is inaccessible, you may see errors in your Kubernetes API server logs indicating that it cannot reach the service. A common cause may be that the operator pods are failing to start for some reason, or that the control plane is isolated from the operator pod by some mechanism (for instance via network policies or running the control plane externally as in issue #869 and issue #1369).
Ask for help
edit- ECK Discuss forums to ask any question
- Github issues for bugs and feature requests