Performance & Cost Insights
Optimizing for performance and cost in Kubernetes requires balancing resource allocation, identifying waste, and ensuring predictable latency. This guide shows you how to right-size your workloads and reduce costs without sacrificing performance.
Understanding Resource Requests and Limits
Requests vs Limits
Requests:
- Guaranteed minimum resources
- Used by scheduler to place pods
- Reserved for the pod
- Cost impact: Nodes need capacity for requests
Limits:
- Maximum resources pod can use
- Throttling occurs if exceeded
- OOMKilled if memory limit exceeded
- Cost impact: Less direct, but affects node density
The Right-Sizing Process
Baseline Measurement
# Monitor actual usage over time kubectl top pods --all-namespaces --containersAnalyze Usage Patterns
- Average usage (set requests)
- Peak usage (set limits)
- Add 20-30% buffer for limits
Apply Changes Gradually
- Start with non-critical workloads
- Monitor for issues
- Adjust based on feedback
Continuous Optimization
- Review monthly
- Adjust for traffic patterns
- Remove unused resources
Identifying Waste
1. Over-Provisioned Pods
Signs:
- Actual usage << requests
- CPU throttling rare or never
- Memory usage well below limits
How to find:
# Compare requests vs actual usage
kubectl top pods --all-namespaces
kubectl describe pod <pod-name> | grep -A 2 "Requests\|Limits"
Example:
Pod: web-app-xyz
Requests: CPU 1000m, Memory 1Gi
Actual: CPU 50m, Memory 128Mi
Waste: 95% CPU, 87% Memory
Solution:
resources:
requests:
cpu: "100m" # Down from 1000m
memory: "256Mi" # Down from 1Gi
limits:
cpu: "500m" # Still have headroom
memory: "512Mi" # But not excessive
2. Idle Pods
Signs:
- Pods with zero or near-zero usage
- High replica counts with low traffic
- Pods that never receive requests
How to find:
# Pods with zero resource usage
kubectl top pods --all-namespaces | awk '$3==0 && $4==0'
# Low-traffic services
kubectl get hpa --all-namespaces
# Check if min replicas > actual needed
Solution:
- Reduce replica count
- Use HPA with lower min replicas
- Consider removing unused services
3. Zombie Resources
Signs:
- Old deployments still running
- Unused ConfigMaps/Secrets
- Orphaned PVCs
- Services pointing to nothing
How to find:
# Find unused deployments
kubectl get deployments --all-namespaces
# Check if any have zero desired replicas
# Find unused PVCs
kubectl get pvc --all-namespaces
# Compare with actual pod usage
# Find services without endpoints
kubectl get svc --all-namespaces -o wide
kubectl get endpoints --all-namespaces
Solution:
- Delete unused resources
- Clean up old PVCs
- Remove orphaned services
4. Inefficient Node Utilization
Signs:
- Nodes with low resource usage
- Many small nodes vs fewer large ones
- High ratio of system overhead
How to find:
# Node resource usage
kubectl top nodes
# Check node capacity
kubectl describe nodes | grep -A 5 "Capacity\|Allocatable"
# Calculate utilization
# (Allocatable - Available) / Allocatable
Target utilization:
- CPU: 70-80%
- Memory: 70-80%
- Below 50% = waste
- Above 90% = risk of issues
Right-Sizing Strategies
Strategy 1: Start Conservative, Scale Up
Initial deployment:
resources:
requests:
cpu: "100m"
memory: "128Mi"
limits:
cpu: "500m"
memory: "512Mi"
After monitoring:
- Adjust based on actual usage
- Scale up if hitting limits frequently
- Scale down if consistently low
Strategy 2: Use HPA with Resource Metrics
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: web-app-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: web-app
minReplicas: 2
maxReplicas: 10
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70
- type: Resource
resource:
name: memory
target:
type: Utilization
averageUtilization: 80
Benefits:
- Automatically scales based on usage
- Reduces waste during low traffic
- Handles traffic spikes
Strategy 3: Vertical Pod Autoscaler (VPA)
VPA automatically adjusts resource requests:
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
name: web-app-vpa
spec:
targetRef:
apiVersion: apps/v1
kind: Deployment
name: web-app
updatePolicy:
updateMode: "Auto" # or "Off", "Initial"
resourcePolicy:
containerPolicies:
- containerName: '*'
minAllowed:
cpu: 50m
memory: 64Mi
maxAllowed:
cpu: 2
memory: 4Gi
Use when:
- Workload patterns vary
- Manual optimization is difficult
- You want automatic right-sizing
Performance Optimization
1. CPU Throttling Prevention
Problem:
- Pods hitting CPU limits
- Throttling causing slow responses
- Poor user experience
Detection:
# Check throttling (requires metrics)
# Look for pods with CPU usage at limit
kubectl top pods --all-namespaces
# Check events for throttling
kubectl get events --all-namespaces | grep -i throttle
Solution:
resources:
limits:
cpu: "1000m" # Increase limit
# Or remove limit if trust the app
2. Memory Leak Prevention
Problem:
- Memory usage gradually increasing
- Eventually OOMKilled
- Frequent restarts
Detection:
# Monitor memory trends
kubectl top pods -w
# Check for OOMKilled
kubectl describe pod <pod-name> | grep -i oom
kubectl get events | grep OOMKilled
Solution:
- Fix memory leak in application
- Increase memory limit temporarily
- Set appropriate memory limits to catch early
3. Network Latency Optimization
Problem:
- High latency between services
- Slow service-to-service communication
Optimizations:
- Use NodePort/LoadBalancer for external traffic
- Keep services in same namespace (faster DNS)
- Use service mesh for optimization (Istio, Linkerd)
- Optimize pod placement (affinity rules)
4. Storage Performance
Problem:
- Slow disk I/O
- High latency on storage operations
Solutions:
- Use SSD-backed storage classes
- Optimize database queries
- Use local storage for hot data
- Cache frequently accessed data
Cost Optimization Techniques
1. Use Spot Instances
For:
- Stateless workloads
- Batch jobs
- Non-critical services
- Development/staging
Configuration:
# Node selector for spot instances
nodeSelector:
node.kubernetes.io/instance-type: spot
2. Cluster Autoscaling
Automatically add/remove nodes:
- Scales down during low usage
- Scales up during peak
- Reduces idle node costs
3. Resource Quotas
Prevent resource waste:
apiVersion: v1
kind: ResourceQuota
metadata:
name: team-quota
spec:
hard:
requests.cpu: "10"
requests.memory: 20Gi
limits.cpu: "20"
limits.memory: 40Gi
4. Right-Size Nodes
Choose appropriate instance types:
- Match workload requirements
- Avoid over-provisioned nodes
- Consider reserved instances for stable workloads
5. Image Optimization
Reduce:
- Image pull time
- Storage costs
- Network transfer
Techniques:
- Multi-stage builds
- Use distroless/minimal base images
- Remove unnecessary packages
- Use image layers efficiently
Monitoring and Metrics
Key Metrics to Track
Resource Utilization:
- CPU usage vs requests/limits
- Memory usage vs requests/limits
- Network I/O
- Storage I/O
Cost Metrics:
- Cost per pod
- Cost per namespace
- Cost per service
- Node utilization
Performance Metrics:
- Response time (p50, p95, p99)
- Error rate
- Throughput
- Request latency
Tools for Cost Analysis
kubectl-cost:
# Install
kubectl krew install cost
# View costs
kubectl cost namespace --show-cpu --show-memory
kubectl cost pod --show-cpu --show-memory
Cloud Provider Tools:
- AWS Cost Explorer
- GCP Cost Management
- Azure Cost Management
Third-Party:
- Kubecost
- OpenCost
- CloudHealth
Right-Sizing Checklist
Initial Deployment
- Set conservative requests
- Set limits with headroom
- Monitor for 24-48 hours
- Review usage patterns
Optimization Phase
- Identify over-provisioned pods
- Identify under-provisioned pods
- Check for idle resources
- Review node utilization
- Calculate waste percentage
Implementation
- Adjust requests based on averages
- Adjust limits based on peaks
- Add HPA for dynamic scaling
- Consider VPA for automatic tuning
- Set up monitoring/alerting
Continuous Improvement
- Monthly resource review
- Quarterly cost analysis
- Remove unused resources
- Optimize based on traffic patterns
- Review and update runbooks
Example: Right-Sizing Workflow
Step 1: Baseline
# Deploy with conservative resources
kubectl apply -f app.yaml
# Monitor for 48 hours
kubectl top pods -w
Step 2: Analyze
# Get usage stats
kubectl top pods --all-namespaces > usage.txt
# Compare requests vs actual
# Identify waste
Step 3: Optimize
# Adjust resources
resources:
requests:
cpu: "150m" # Based on average + buffer
memory: "256Mi" # Based on average + buffer
limits:
cpu: "500m" # Based on peak + buffer
memory: "512Mi" # Based on peak + buffer
Step 4: Validate
# Deploy changes
kubectl apply -f app.yaml
# Monitor for issues
kubectl get events --watch
kubectl top pods -w
Key Takeaways
- Right-size based on actual usage, not guesses
- Monitor continuously to catch drift
- Use HPA/VPA for automatic optimization
- Review regularly (monthly minimum)
- Balance cost and performance - don't optimize one at expense of other
- Start conservative, scale up if needed
- Remove unused resources regularly
- Use appropriate instance types for workloads
- Monitor key metrics (CPU, memory, latency, cost)
- Document decisions for future reference
Cost optimization is an ongoing process. Start with the basics, measure everything, and continuously improve. Your cloud bill (and performance) will thank you!