Troubleshooting Kubernetes Pods in Evicted State

A pod in the Evicted state means that the Kubernetes node forcibly terminated and removed the pod from the node. This occurs when the node faces resource pressure, usually due to low memory, disk space, or node-level problems. The kubelet evicts pods to free up resources for system stability.

Unlike CrashLoopBackOff or Error states, evicted pods cannot restart automatically. You'll need to recreate them manually or rely on a higher-level controller (Deployment, ReplicaSet, DaemonSet) to restore them.

You can identify this state using:

kubectl get pods

NAME              READY   STATUS    RESTARTS   AGE
api-pod           0/1     Evicted   0          10m

Impact of Evicted State

When a pod is evicted, it is terminated permanently on that node.

Consequences:

The pod and its data (if not backed by persistent storage) are lost
The application may experience downtime until the controller reschedules a new pod
Jobs or batch workloads may fail if not retried automatically
Critical services running without redundancy may become unavailable

Bottom line: An evicted pod means the node was under stress, so Kubernetes decided to remove workloads to protect system health.

Common Causes and Solutions

1. Memory Pressure

Symptom: Node ran out of memory; kubelet evicted low-priority pods to reclaim resources.

Diagnosis:

kubectl describe pod <pod-name> | grep -i "evicted\|memory"
kubectl top nodes
kubectl get nodes -o custom-columns=NAME:.metadata.name,MEMORY:.status.capacity.memory

Solutions:

Free up node memory by removing unnecessary pods
Increase memory limits for pods that need more
Add more nodes to the cluster
Increase node memory capacity
Adjust eviction thresholds in kubelet configuration

2. Disk Pressure

Symptom: Disk usage exceeded thresholds (configured in --eviction-hard settings).

Diagnosis:

kubectl describe pod <pod-name> | grep -i "disk\|evicted"
kubectl get nodes -o jsonpath='{.items[*].status.conditions[?(@.type=="DiskPressure")]}'
df -h  # On the node

Solutions:

Clean up unused container images: docker system prune -a or crictl rmi --prune
Remove old logs and temporary files
Increase disk space on nodes
Clean up unused PersistentVolumes
Adjust eviction thresholds
Use image garbage collection policies

3. CPU or Ephemeral Storage Pressure

Symptom: Node experienced high CPU or ephemeral storage usage.

Diagnosis:

kubectl top nodes
kubectl describe node <node-name> | grep -i "pressure"

Solutions:

Scale down resource-intensive pods
Distribute workloads across more nodes
Increase node resources
Implement resource quotas to prevent resource exhaustion

4. Node Maintenance

Symptom: Admin used kubectl drain or kubectl cordon for upgrades or maintenance.

Diagnosis:

kubectl get nodes
kubectl describe node <node-name> | grep -i "cordon\|drain\|unschedulable"

Solutions:

Wait for maintenance to complete
Uncordon node: kubectl uncordon <node-name>
Pods should be rescheduled automatically by controllers
Plan maintenance during low-traffic periods

5. Pod Priority Preemption

Symptom: Higher-priority workloads preempted the pod.

Diagnosis:

kubectl get pod <pod-name> -o jsonpath='{.spec.priority}'
kubectl get priorityclass

Solutions:

Increase pod priority if it's critical
Create PriorityClass with appropriate priority value
Ensure critical pods have higher priority than non-critical ones

6. Volume Issues

Symptom: Pod's volume ran out of space.

Diagnosis:

kubectl describe pod <pod-name> | grep -A 10 "Volumes:"
kubectl get pvc

Solutions:

Increase PersistentVolume size
Clean up data in volumes
Use volume expansion feature
Monitor volume usage proactively

Step-by-Step Troubleshooting

Step 1: Check Eviction Reason

kubectl describe pod <pod-name>
# Look for:
# Reason:       Evicted
# Message:     The node was low on resource: memory

Step 2: Check Node Resources

kubectl top nodes
kubectl describe node <node-name>
# Check for pressure conditions (MemoryPressure, DiskPressure)

Step 3: Check Node Events

kubectl get events --field-selector involvedObject.name=<node-name> --sort-by='.lastTimestamp'

Step 4: Verify Controller Will Recreate

kubectl get deployment,replicaset -o wide
# Deployment/ReplicaSet will recreate the pod

Step 5: Clean Up Evicted Pods

# List all evicted pods
kubectl get pods --all-namespaces --field-selector=status.phase=Failed

# Delete evicted pods (controllers will recreate)
kubectl delete pod <evicted-pod-name>

Quick Fixes

Immediate Actions

Delete evicted pod (controller will recreate):
```
kubectl delete pod <evicted-pod-name>
```

Clean up node resources:

# On the node
docker system prune -a  # For Docker
crictl rmi --prune      # For containerd

Add more nodes: Quickly scale cluster to reduce resource pressure
Increase node resources: Upgrade node instance types if using cloud

Long-term Solutions

Configure eviction thresholds:

# kubelet config
evictionHard:
  memory.available: "500Mi"
  nodefs.available: "10%"
  imagefs.available: "15%"

Implement resource requests and limits:

resources:
  requests:
    memory: "256Mi"
    cpu: "250m"
  limits:
    memory: "512Mi"
    cpu: "500m"

Use pod priority classes:

apiVersion: scheduling.k8s.io/v1
kind: PriorityClass
metadata:
  name: high-priority
value: 1000

Monitor and alert: Set up monitoring for node resource usage

Preventing Pod Evictions

Right-size resources: Set appropriate requests and limits
Monitor cluster capacity: Track resource usage trends
Implement horizontal autoscaling: Scale pods before resource exhaustion
Use resource quotas: Prevent any namespace from consuming all resources
Regular cleanup: Clean unused images, volumes, and logs
Capacity planning: Add nodes before reaching capacity
Use pod disruption budgets: Protect critical pods from eviction

Eviction Policies

Soft Eviction

Gradually evict pods with grace period:

evictionSoft:
  memory.available: "1Gi"
evictionSoftGracePeriod:
  memory.available: "30s"

Hard Eviction

Immediate eviction when threshold reached:

evictionHard:
  memory.available: "500Mi"
  nodefs.available: "10%"

Related Resources

Conclusion

Pod evictions are typically caused by node resource pressure (memory, disk, CPU). Once a pod is evicted, it needs to be recreated by its controller. Focus on preventing evictions by monitoring node resources, right-sizing pods, and maintaining adequate cluster capacity.

Remember: Evicted pods are permanently removed from the node and must be recreated. Always have controllers (Deployments, ReplicaSets) managing your pods for automatic recovery.