Common problems
editCommon problems
editOperator crashes on startup with OOMKilled
editOn very large Kubernetes clusters with many hundreds of resources (pods, secrets, config maps, and so on), the operator may fail to start with its pod getting killed with a OOMKilled
message. This is an issue with the controller-runtime
framework on top of which the operator is built. Even though the operator is only interested in the resources created by itself, the framework code needs to gather information about all relevant resources in the Kubernetes cluster in order to provide the filtered view of cluster state required by the operator. On very large clusters, this information gathering can use up a lot of memory and exceed the default resource limit defined for the operator pod.
The default memory limit for the operator pod is set to 512 MiB. You can increase (or decrease) this limit to a value suited to your cluster as follows:
kubectl patch sts elastic-operator -n elastic-system -p '{"spec":{"template":{"spec":{"containers":[{"name":"manager", "resources":{"limits":{"memory":"768Mi"}}}]}}}}'
Timeout when submitting a resource manifest
editWhen submitting a ECK resource manifest, you may encounter an error message similar to the following:
Error from server (Timeout): error when creating "elasticsearch.yaml": Timeout: request did not complete within requested timeout 30s
This error is usually an indication of a problem communicating with the validating webhook. If you are running ECK on a private Google Kubernetes Engine (GKE) cluster, you may need to add a firewall rule allowing port 9443 from the API server. Another possible cause for failure is if a strict network policy is in effect. Refer to the webhook troubleshooting documentation for more details and workarounds.
Copying secrets with Owner References
editCopying the Elasticsearch Secrets generated by ECK (for instance, the certificate authority or the elastic user) into another namespace wholesale can trigger a Kubernetes bug which can delete all of the Elasticsearch-related resources (for instance, the data volumes). To avoid this, when copying Secrets between namespaces, remove the metadata.ownerReferences
section. For example, a source secret might be:
$ kubectl get secret quickstart-es-elastic-user -o yaml apiVersion: v1 data: elastic: NGw2Q2REMjgwajZrMVRRS0hxUDVUUTU0 kind: Secret metadata: creationTimestamp: "2020-06-09T19:11:41Z" labels: common.k8s.elastic.co/type: elasticsearch eck.k8s.elastic.co/credentials: "true" elasticsearch.k8s.elastic.co/cluster-name: quickstart name: quickstart-es-elastic-user namespace: default ownerReferences: - apiVersion: elasticsearch.k8s.elastic.co/v1 blockOwnerDeletion: true controller: true kind: Elasticsearch name: quickstart uid: c7a9b436-aa07-4341-a2cc-b33b3dfcbe29 resourceVersion: "13048277" selfLink: /api/v1/namespaces/default/secrets/quickstart-es-elastic-user uid: 04cdf334-77d3-4de6-a2e8-7a2b23366a27 type: Opaque
To copy it to a different namespace, strip the metadata.ownerReferences
field as well as the object-specific data:
apiVersion: v1 data: elastic: NGw2Q2REMjgwajZrMVRRS0hxUDVUUTU0 kind: Secret metadata: labels: common.k8s.elastic.co/type: elasticsearch eck.k8s.elastic.co/credentials: "true" elasticsearch.k8s.elastic.co/cluster-name: quickstart name: quickstart-es-elastic-user namespace: default type: Opaque
Failure to do so can cause data loss.