Performance & Cost Insights

Optimizing for performance and cost in Kubernetes requires balancing resource allocation, identifying waste, and ensuring predictable latency. This guide shows you how to right-size your workloads and reduce costs without sacrificing performance.

Understanding Resource Requests and Limits

Requests vs Limits

Requests:

Guaranteed minimum resources
Used by scheduler to place pods
Reserved for the pod
Cost impact: Nodes need capacity for requests

Limits:

Maximum resources pod can use
Throttling occurs if exceeded
OOMKilled if memory limit exceeded
Cost impact: Less direct, but affects node density

The Right-Sizing Process

Baseline Measurement

# Monitor actual usage over time
kubectl top pods --all-namespaces --containers

Analyze Usage Patterns
- Average usage (set requests)
- Peak usage (set limits)
- Add 20-30% buffer for limits
Apply Changes Gradually
- Start with non-critical workloads
- Monitor for issues
- Adjust based on feedback
Continuous Optimization
- Review monthly
- Adjust for traffic patterns
- Remove unused resources

Identifying Waste

1. Over-Provisioned Pods

Signs:

Actual usage << requests
CPU throttling rare or never
Memory usage well below limits

How to find:

# Compare requests vs actual usage
kubectl top pods --all-namespaces
kubectl describe pod <pod-name> | grep -A 2 "Requests\|Limits"

Example:

Pod: web-app-xyz
Requests: CPU 1000m, Memory 1Gi
Actual:    CPU 50m,   Memory 128Mi
Waste:     95% CPU, 87% Memory

Solution:

resources:
  requests:
    cpu: "100m"    # Down from 1000m
    memory: "256Mi" # Down from 1Gi
  limits:
    cpu: "500m"     # Still have headroom
    memory: "512Mi" # But not excessive

2. Idle Pods

Signs:

Pods with zero or near-zero usage
High replica counts with low traffic
Pods that never receive requests

How to find:

# Pods with zero resource usage
kubectl top pods --all-namespaces | awk '$3==0 && $4==0'

# Low-traffic services
kubectl get hpa --all-namespaces
# Check if min replicas > actual needed

Solution:

Reduce replica count
Use HPA with lower min replicas
Consider removing unused services

3. Zombie Resources

Signs:

Old deployments still running
Unused ConfigMaps/Secrets
Orphaned PVCs
Services pointing to nothing

How to find:

# Find unused deployments
kubectl get deployments --all-namespaces
# Check if any have zero desired replicas

# Find unused PVCs
kubectl get pvc --all-namespaces
# Compare with actual pod usage

# Find services without endpoints
kubectl get svc --all-namespaces -o wide
kubectl get endpoints --all-namespaces

Solution:

Delete unused resources
Clean up old PVCs
Remove orphaned services

4. Inefficient Node Utilization

Signs:

Nodes with low resource usage
Many small nodes vs fewer large ones
High ratio of system overhead

How to find:

# Node resource usage
kubectl top nodes

# Check node capacity
kubectl describe nodes | grep -A 5 "Capacity\|Allocatable"

# Calculate utilization
# (Allocatable - Available) / Allocatable

Target utilization:

CPU: 70-80%
Memory: 70-80%
Below 50% = waste
Above 90% = risk of issues

Right-Sizing Strategies

Strategy 1: Start Conservative, Scale Up

Initial deployment:

resources:
  requests:
    cpu: "100m"
    memory: "128Mi"
  limits:
    cpu: "500m"
    memory: "512Mi"

After monitoring:

Adjust based on actual usage
Scale up if hitting limits frequently
Scale down if consistently low

Strategy 2: Use HPA with Resource Metrics

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: web-app-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: web-app
  minReplicas: 2
  maxReplicas: 10
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 70
  - type: Resource
    resource:
      name: memory
      target:
        type: Utilization
        averageUtilization: 80

Benefits:

Automatically scales based on usage
Reduces waste during low traffic
Handles traffic spikes

Strategy 3: Vertical Pod Autoscaler (VPA)

VPA automatically adjusts resource requests:

apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
  name: web-app-vpa
spec:
  targetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: web-app
  updatePolicy:
    updateMode: "Auto"  # or "Off", "Initial"
  resourcePolicy:
    containerPolicies:
    - containerName: '*'
      minAllowed:
        cpu: 50m
        memory: 64Mi
      maxAllowed:
        cpu: 2
        memory: 4Gi

Use when:

Workload patterns vary
Manual optimization is difficult
You want automatic right-sizing

Performance Optimization

1. CPU Throttling Prevention

Problem:

Pods hitting CPU limits
Throttling causing slow responses
Poor user experience

Detection:

# Check throttling (requires metrics)
# Look for pods with CPU usage at limit
kubectl top pods --all-namespaces

# Check events for throttling
kubectl get events --all-namespaces | grep -i throttle

Solution:

resources:
  limits:
    cpu: "1000m"  # Increase limit
    # Or remove limit if trust the app

2. Memory Leak Prevention

Problem:

Memory usage gradually increasing
Eventually OOMKilled
Frequent restarts

Detection:

# Monitor memory trends
kubectl top pods -w

# Check for OOMKilled
kubectl describe pod <pod-name> | grep -i oom
kubectl get events | grep OOMKilled

Solution:

Fix memory leak in application
Increase memory limit temporarily
Set appropriate memory limits to catch early

3. Network Latency Optimization

Problem:

High latency between services
Slow service-to-service communication

Optimizations:

Use NodePort/LoadBalancer for external traffic
Keep services in same namespace (faster DNS)
Use service mesh for optimization (Istio, Linkerd)
Optimize pod placement (affinity rules)

4. Storage Performance

Problem:

Slow disk I/O
High latency on storage operations

Solutions:

Use SSD-backed storage classes
Optimize database queries
Use local storage for hot data
Cache frequently accessed data

Cost Optimization Techniques

1. Use Spot Instances

For:

Stateless workloads
Batch jobs
Non-critical services
Development/staging

Configuration:

# Node selector for spot instances
nodeSelector:
  node.kubernetes.io/instance-type: spot

2. Cluster Autoscaling

Automatically add/remove nodes:

Scales down during low usage
Scales up during peak
Reduces idle node costs

3. Resource Quotas

Prevent resource waste:

apiVersion: v1
kind: ResourceQuota
metadata:
  name: team-quota
spec:
  hard:
    requests.cpu: "10"
    requests.memory: 20Gi
    limits.cpu: "20"
    limits.memory: 40Gi

4. Right-Size Nodes

Choose appropriate instance types:

Match workload requirements
Avoid over-provisioned nodes
Consider reserved instances for stable workloads

5. Image Optimization

Reduce:

Image pull time
Storage costs
Network transfer

Techniques:

Multi-stage builds
Use distroless/minimal base images
Remove unnecessary packages
Use image layers efficiently

Monitoring and Metrics

Key Metrics to Track

Resource Utilization:

CPU usage vs requests/limits
Memory usage vs requests/limits
Network I/O
Storage I/O

Cost Metrics:

Cost per pod
Cost per namespace
Cost per service
Node utilization

Performance Metrics:

Response time (p50, p95, p99)
Error rate
Throughput
Request latency

Tools for Cost Analysis

kubectl-cost:

# Install
kubectl krew install cost

# View costs
kubectl cost namespace --show-cpu --show-memory
kubectl cost pod --show-cpu --show-memory

Cloud Provider Tools:

AWS Cost Explorer
GCP Cost Management
Azure Cost Management

Third-Party:

Kubecost
OpenCost
CloudHealth

Right-Sizing Checklist

Initial Deployment

Set conservative requests
Set limits with headroom
Monitor for 24-48 hours
Review usage patterns

Optimization Phase

Identify over-provisioned pods
Identify under-provisioned pods
Check for idle resources
Review node utilization
Calculate waste percentage

Implementation

Adjust requests based on averages
Adjust limits based on peaks
Add HPA for dynamic scaling
Consider VPA for automatic tuning
Set up monitoring/alerting

Continuous Improvement

Monthly resource review
Quarterly cost analysis
Remove unused resources
Optimize based on traffic patterns
Review and update runbooks

Example: Right-Sizing Workflow

Step 1: Baseline

# Deploy with conservative resources
kubectl apply -f app.yaml

# Monitor for 48 hours
kubectl top pods -w

Step 2: Analyze

# Get usage stats
kubectl top pods --all-namespaces > usage.txt

# Compare requests vs actual
# Identify waste

Step 3: Optimize

# Adjust resources
resources:
  requests:
    cpu: "150m"    # Based on average + buffer
    memory: "256Mi" # Based on average + buffer
  limits:
    cpu: "500m"     # Based on peak + buffer
    memory: "512Mi" # Based on peak + buffer

Step 4: Validate

# Deploy changes
kubectl apply -f app.yaml

# Monitor for issues
kubectl get events --watch
kubectl top pods -w

Key Takeaways

Right-size based on actual usage, not guesses
Monitor continuously to catch drift
Use HPA/VPA for automatic optimization
Review regularly (monthly minimum)
Balance cost and performance - don't optimize one at expense of other
Start conservative, scale up if needed
Remove unused resources regularly
Use appropriate instance types for workloads
Monitor key metrics (CPU, memory, latency, cost)
Document decisions for future reference

Cost optimization is an ongoing process. Start with the basics, measure everything, and continuously improve. Your cloud bill (and performance) will thank you!