Troubleshooting Kubernetes Pods in Pending State

Troubleshooting Kubernetes Pods Stuck in Pending State

A pod in Kubernetes enters the Pending state when it has been created and accepted by the Kubernetes control plane but hasn't been scheduled to run on any node yet. Understanding why pods get stuck in this state is crucial for maintaining healthy Kubernetes clusters.

What is the Pending State?

The Pending state indicates that either:

The Kubernetes scheduler is still trying to find a suitable node for the pod
Container images are being downloaded
Required resources (like PersistentVolumes) are being provisioned

You can identify pending pods using:

kubectl get pods

NAME               READY   STATUS    RESTARTS   AGE
my-app-pod         0/1     Pending  0          5m

Impact of Pending Pods

Applications won't start running until pods are scheduled
Dependent services and workloads will be delayed
Deployment rollouts may stall or fail
Auto-scaling operations may malfunction
Production availability issues can occur

In short: A Pending pod indicates a scheduling or provisioning issue that must be resolved before your workload can function.

Common Causes and Solutions

1. Insufficient Node Resources

Symptom: No nodes have enough CPU or memory to satisfy the pod's resource requests.

Diagnosis:

# Check node resource availability
kubectl top nodes

# Check pod resource requests
kubectl describe pod <pod-name> | grep -A 5 "Requests:"

Solutions:

Increase cluster capacity by adding nodes
Reduce pod resource requests
Clear unused pods to free resources
Implement resource quotas to prevent resource hoarding

2. NodeSelector or Affinity Mismatch

Symptom: Pod's node selection rules don't match any available node.

Diagnosis:

kubectl describe pod <pod-name> | grep -A 10 "Node-Selectors:"
kubectl get nodes --show-labels

Solutions:

Remove restrictive node selectors if not needed
Add matching labels to nodes: kubectl label nodes <node-name> <key>=<value>
Adjust affinity rules to match your cluster topology

3. Taints and Tolerations

Symptom: Nodes have taints that the pod doesn't tolerate.

Diagnosis:

kubectl describe node <node-name> | grep -A 5 "Taints:"
kubectl describe pod <pod-name> | grep -A 5 "Tolerations:"

Solutions:

Add matching tolerations to your pod spec
Remove unnecessary taints from nodes
Create a dedicated node pool without taints for workloads

4. PersistentVolumeClaim Issues

Symptom: Pod references a PVC that isn't bound to a PersistentVolume.

Diagnosis:

kubectl get pvc
kubectl describe pvc <pvc-name>

Solutions:

Check storage class configuration
Verify storage provisioner is running
Ensure sufficient storage capacity exists
Review PVC access modes (ReadWriteOnce, ReadOnlyMany, ReadWriteMany)

5. Network Plugin Not Ready

Symptom: CNI plugin hasn't initialized on nodes.

Diagnosis:

kubectl get nodes
kubectl describe node <node-name> | grep -i "network"

Solutions:

Wait for CNI plugin to initialize (usually automatic)
Restart CNI pods if stuck
Check CNI pod logs: kubectl logs -n kube-system <cni-pod-name>

6. All Nodes Unschedulable

Symptom: All nodes are cordoned or marked unschedulable.

Diagnosis:

kubectl get nodes
kubectl describe node <node-name> | grep -i "unschedulable"

Solutions:

Uncordon nodes: kubectl uncordon <node-name>
Check why nodes were cordoned (maintenance, issues)
Ensure at least some nodes are schedulable

Step-by-Step Troubleshooting Process

Step 1: Check Pod Events

kubectl describe pod <pod-name>

Look for events that explain why the pod isn't scheduling, such as:

"Insufficient cpu"
"Insufficient memory"
"0/3 nodes are available"

Step 2: Check Node Capacity

kubectl get nodes -o custom-columns=NAME:.metadata.name,CPU:.status.capacity.cpu,MEMORY:.status.capacity.memory
kubectl top nodes

Step 3: Verify Pod Requirements

kubectl get pod <pod-name> -o yaml | grep -A 10 "resources:"

Step 4: Check Scheduling Constraints

kubectl get pod <pod-name> -o yaml | grep -A 5 "nodeSelector:"
kubectl get pod <pod-name> -o yaml | grep -A 10 "affinity:"

Quick Fixes

Immediate Actions

Delete and recreate: Sometimes recreating the pod helps
```
kubectl delete pod <pod-name>
```
Add a node: Quickly add capacity to your cluster
Remove resource constraints: Temporarily remove resource requests to test
Check for stuck PVCs: Delete and recreate PVCs if they're stuck

Preventive Measures

Set appropriate resource requests (not too high, not too low)
Use HorizontalPodAutoscaler for dynamic scaling
Implement resource quotas at namespace level
Monitor node capacity and plan scaling
Use node affinity carefully (prefer soft affinity)
Document node taints and ensure pods have tolerations

Related Resources

Conclusion

Pods stuck in Pending state are usually caused by resource constraints, scheduling rules, or provisioning issues. Use kubectl describe pod to identify the specific cause, then apply the appropriate solution based on the error message.

Remember: Most Pending state issues can be resolved by either adjusting pod requirements or adding cluster capacity.