KubeDeploymentReplicasMismatch #

Meaning #

This alert is fired when a discrepancy between the desired number of replicas to the actual number of running instances for deployment was observed for a certain period.

Impact #

The impact very much differs depending on the discrepancy.

Diagnosis #

The alert should note where the discrepancy occurred under the deployment label:

 - alertname = KubeDeploymentReplicasMismatch
...
 - deployment = elasticsearch-cdm-u1gqqbu6-2
...
 - namespace = logging
...

Start by checking the status of the deployment:

$ kubectl get deploy -n $NAMESPACE $DEPLOYMENT

Review the current deployment using the details available in the alert. Review the following in the target namespace to ascertain the reason behind this. The events:

$ kubectl get events -n $NAMESPACE

Further, check the states of the Pods that the deployment manages:

$ kubectl get pods -n $NAMESPACE --selector=app=$DEPLOYMENT

Possibilities include (but are not limited to) a pod stuck in ContainerCreating or CrashLoopBackoff. The events may list this case information about possible failed actions of a pod. Application and startup failures should be visible with:

$ kubectl describe pod $POD

If Pods are stuck in Pending, it means that insufficient resources prevent the pod from being scheduled. Check the health of the nodes.

$ kubectl get nodes

It is further possible that the CPU and Memory of the host are exhausted.

$ kubeclt top nodes

Mitigation #

Resolve the problems discovered during the diagnosis according to the documentation. It is safe to delete the pods since they are managed by the deployment. However, it may also be required to add more nodes in case of insufficient resources.