Kubernetes Horizontal Pod Autoscaler (HPA)
The Horizontal Pod Autoscaler (HPA) is a critical Kubernetes feature that automatically scales the number of pods in a deployment, replica set, or stateful set based on observed CPU utilization, memory usage, or custom metrics. This enables your applications to handle varying loads efficiently without manual intervention.
What is Horizontal Pod Autoscaler?
HPA automatically adjusts the number of pod replicas based on resource utilization. Unlike vertical scaling (which changes resource limits), horizontal scaling adds or removes pod instances, making it ideal for stateless applications.
How HPA Works
- Metrics Collection: HPA queries the Metrics Server (or custom metrics API) to gather resource usage data
- Target Calculation: Compares current usage against target thresholds
- Scaling Decision: Calculates desired replica count based on the formula:
desiredReplicas = ceil[currentReplicas * (currentMetricValue / desiredMetricValue)] - Replica Adjustment: Updates the deployment/replicaset to match desired replica count
- Cooldown Period: Waits for stabilization before making another scaling decision
Prerequisites
Before using HPA, ensure you have:
- Metrics Server: Installed and running in your cluster
- Resource Requests: Pods must have CPU/memory requests defined
- RBAC Permissions: HPA controller needs proper permissions
Installing Metrics Server
# For most Kubernetes distributions
kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml
# Verify installation
kubectl get deployment metrics-server -n kube-system
kubectl top nodes
Basic HPA Configuration
Simple CPU-Based Autoscaling
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: web-app-hpa
namespace: default
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: web-app
minReplicas: 2
maxReplicas: 10
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70
behavior:
scaleDown:
stabilizationWindowSeconds: 300
policies:
- type: Percent
value: 50
periodSeconds: 60
scaleUp:
stabilizationWindowSeconds: 0
policies:
- type: Percent
value: 100
periodSeconds: 15
- type: Pods
value: 4
periodSeconds: 15
selectPolicy: Max
Memory-Based Autoscaling
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: memory-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: memory-intensive-app
minReplicas: 1
maxReplicas: 20
metrics:
- type: Resource
resource:
name: memory
target:
type: Utilization
averageUtilization: 80
Advanced HPA Features
Multiple Metrics
HPA can scale based on multiple metrics simultaneously:
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: multi-metric-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: api-server
minReplicas: 3
maxReplicas: 50
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70
- type: Resource
resource:
name: memory
target:
type: Utilization
averageUtilization: 80
- type: Pods
pods:
metric:
name: requests-per-second
target:
type: AverageValue
averageValue: "100"
Common Use Cases
1. Web Application Scaling
Scale a web application based on CPU usage during traffic spikes:
apiVersion: apps/v1
kind: Deployment
metadata:
name: web-app
spec:
replicas: 3
selector:
matchLabels:
app: web-app
template:
metadata:
labels:
app: web-app
spec:
containers:
- name: nginx
image: nginx:1.21
resources:
requests:
cpu: 100m
memory: 128Mi
limits:
cpu: 500m
memory: 512Mi
---
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: web-app-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: web-app
minReplicas: 3
maxReplicas: 20
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70
Monitoring HPA
Check HPA Status
# View HPA details
kubectl describe hpa <hpa-name>
# Get current HPA status
kubectl get hpa
# Watch HPA in real-time
kubectl get hpa -w
Best Practices
- Set Appropriate Resource Requests: Always define CPU and memory requests for accurate scaling
- Choose Realistic Targets: CPU 50-80%, Memory 70-85% are typical ranges
- Configure Scaling Behavior: Use behavior configuration to prevent thrashing
- Set Reasonable Limits: minReplicas for HA (2-3), maxReplicas to prevent cost overruns
- Monitor and Adjust: Regularly review HPA metrics and adjust targets
Troubleshooting
HPA Not Scaling
# Check if Metrics Server is running
kubectl get deployment metrics-server -n kube-system
# Verify resource requests are set
kubectl describe pod <pod-name> | grep -A 5 "Requests:"
# Check HPA events
kubectl describe hpa <hpa-name>
Related Resources
Learning Path Navigation
📚 Learning Path: Day-2 Operations: Production Kubernetes Management
Advanced operations for production Kubernetes clusters
Navigate this path:
← Previous: Kubernetes Logging Best Practices | Next: Kubernetes Vertical Pod Autoscaler →
This blog is part of multiple learning paths:
- Day-2 Operations: Production Kubernetes Management (Step 5/10)
- Kubernetes Scaling and Autoscaling (Step 2/7)
Conclusion
Horizontal Pod Autoscaler is essential for running cost-effective, responsive applications in Kubernetes. By automatically adjusting replica counts based on demand, HPA helps you maintain performance while optimizing resource usage.