Kubernetes Horizontal Pod Autoscaler

Kubernetes Horizontal Pod Autoscaler (HPA)

The Horizontal Pod Autoscaler (HPA) is a critical Kubernetes feature that automatically scales the number of pods in a deployment, replica set, or stateful set based on observed CPU utilization, memory usage, or custom metrics. This enables your applications to handle varying loads efficiently without manual intervention.

What is Horizontal Pod Autoscaler?

HPA automatically adjusts the number of pod replicas based on resource utilization. Unlike vertical scaling (which changes resource limits), horizontal scaling adds or removes pod instances, making it ideal for stateless applications.

How HPA Works

  1. Metrics Collection: HPA queries the Metrics Server (or custom metrics API) to gather resource usage data
  2. Target Calculation: Compares current usage against target thresholds
  3. Scaling Decision: Calculates desired replica count based on the formula: desiredReplicas = ceil[currentReplicas * (currentMetricValue / desiredMetricValue)]
  4. Replica Adjustment: Updates the deployment/replicaset to match desired replica count
  5. Cooldown Period: Waits for stabilization before making another scaling decision

Prerequisites

Before using HPA, ensure you have:

  • Metrics Server: Installed and running in your cluster
  • Resource Requests: Pods must have CPU/memory requests defined
  • RBAC Permissions: HPA controller needs proper permissions

Installing Metrics Server

# For most Kubernetes distributions
kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml

# Verify installation
kubectl get deployment metrics-server -n kube-system
kubectl top nodes

Basic HPA Configuration

Simple CPU-Based Autoscaling

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: web-app-hpa
  namespace: default
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: web-app
  minReplicas: 2
  maxReplicas: 10
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 70
  behavior:
    scaleDown:
      stabilizationWindowSeconds: 300
      policies:
      - type: Percent
        value: 50
        periodSeconds: 60
    scaleUp:
      stabilizationWindowSeconds: 0
      policies:
      - type: Percent
        value: 100
        periodSeconds: 15
      - type: Pods
        value: 4
        periodSeconds: 15
      selectPolicy: Max

Memory-Based Autoscaling

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: memory-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: memory-intensive-app
  minReplicas: 1
  maxReplicas: 20
  metrics:
  - type: Resource
    resource:
      name: memory
      target:
        type: Utilization
        averageUtilization: 80

Advanced HPA Features

Multiple Metrics

HPA can scale based on multiple metrics simultaneously:

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: multi-metric-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: api-server
  minReplicas: 3
  maxReplicas: 50
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 70
  - type: Resource
    resource:
      name: memory
      target:
        type: Utilization
        averageUtilization: 80
  - type: Pods
    pods:
      metric:
        name: requests-per-second
      target:
        type: AverageValue
        averageValue: "100"

Common Use Cases

1. Web Application Scaling

Scale a web application based on CPU usage during traffic spikes:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: web-app
spec:
  replicas: 3
  selector:
    matchLabels:
      app: web-app
  template:
    metadata:
      labels:
        app: web-app
    spec:
      containers:
      - name: nginx
        image: nginx:1.21
        resources:
          requests:
            cpu: 100m
            memory: 128Mi
          limits:
            cpu: 500m
            memory: 512Mi
---
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: web-app-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: web-app
  minReplicas: 3
  maxReplicas: 20
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 70

Monitoring HPA

Check HPA Status

# View HPA details
kubectl describe hpa <hpa-name>

# Get current HPA status
kubectl get hpa

# Watch HPA in real-time
kubectl get hpa -w

Best Practices

  1. Set Appropriate Resource Requests: Always define CPU and memory requests for accurate scaling
  2. Choose Realistic Targets: CPU 50-80%, Memory 70-85% are typical ranges
  3. Configure Scaling Behavior: Use behavior configuration to prevent thrashing
  4. Set Reasonable Limits: minReplicas for HA (2-3), maxReplicas to prevent cost overruns
  5. Monitor and Adjust: Regularly review HPA metrics and adjust targets

Troubleshooting

HPA Not Scaling

# Check if Metrics Server is running
kubectl get deployment metrics-server -n kube-system

# Verify resource requests are set
kubectl describe pod <pod-name> | grep -A 5 "Requests:"

# Check HPA events
kubectl describe hpa <hpa-name>

Related Resources


Learning Path Navigation

📚 Learning Path: Day-2 Operations: Production Kubernetes Management

Advanced operations for production Kubernetes clusters

Navigate this path:

Previous: Kubernetes Logging Best Practices | Next: Kubernetes Vertical Pod Autoscaler

This blog is part of multiple learning paths:


Conclusion

Horizontal Pod Autoscaler is essential for running cost-effective, responsive applications in Kubernetes. By automatically adjusting replica counts based on demand, HPA helps you maintain performance while optimizing resource usage.