
Running Kubernetes in production is powerful — but without cost controls in place, your cloud bill grows faster than your traffic does.
According to Datadog’s State of Cloud Costs report, 83% of container costs go to idle resources — split between overprovisioned Kubernetes cluster infrastructure at 54% and oversized workload requests at 29%. That means most of what you are paying for right now is compute your workloads never actually use.
The good news: with the right tools, policies, and practices, teams are cutting cloud spending by 30–50% without sacrificing performance or reliability.
This guide covers 7 proven strategies with real YAML examples you can apply to your cluster today.
Why Kubernetes Costs Spiral Out of Control
Before optimizing, you need to understand why costs grow unchecked in the first place.
The most common causes are overprovisioned resource requests where developers set high CPU and memory requests just in case and those resources are reserved whether used or not, no resource limits where pods without limits consume unbounded resources during spikes, idle namespaces where dev and staging environments run 24/7 when they are only used during business hours, oversized node pools where cluster autoscaler is configured too conservatively keeping excess nodes running, and no cost visibility where teams do not know what their workloads actually cost so they cannot optimize.
Recent industry data shows over 68% of organizations overspend on Kubernetes by 20–40% or more, often due to misconfigurations and lack of ongoing governance. In 2026, with AI workloads and larger clusters, these gaps are even more expensive.
Strategy 1 — Set Resource Requests and Limits on Every Pod
This is the single biggest cost lever available. Setting resource requests too high leads to wasted resources — nodes are reserved for workloads that rarely or never need that much capacity. Conversely, setting them too low risks application instability due to insufficient resources.
Without resource requests — dangerous and expensive:
apiVersion: apps/v1
kind: Deployment
metadata:
name: app-without-limits
spec:
replicas: 3
selector:
matchLabels:
app: sample
template:
metadata:
labels:
app: sample
spec:
containers:
- name: app
image: nginx
# No requests or limits — Kubernetes has no idea
# how to schedule this efficiently
With proper resource requests and limits:
apiVersion: apps/v1
kind: Deployment
metadata:
name: app-with-limits
spec:
replicas: 3
selector:
matchLabels:
app: sample
template:
metadata:
labels:
app: sample
spec:
containers:
- name: app
image: nginx
resources:
requests:
cpu: "200m"
memory: "256Mi"
limits:
cpu: "500m"
memory: "512Mi"
How to find the right values:
Do not guess. Use actual usage data from Prometheus or kubectl:
# Check actual CPU and memory usage per pod
kubectl top pods -n production
# Check node utilization
kubectl top nodes
# Get detailed resource usage
kubectl describe node <node-name> | grep -A 5 "Allocated resources"
Profile workload behavior under real production conditions, gather historical usage patterns, and assign values that closely mirror real need while leaving minimal headroom. Tools like the Vertical Pod Autoscaler or custom monitoring with Prometheus and Grafana can provide granular insights into how much CPU and memory containers actually consume.
Expected savings: 20–50% on compute costs.
Strategy 2 — Implement All Three Autoscalers
Kubernetes offers three autoscaling mechanisms that work together: Horizontal Pod Autoscaler scales pod count based on demand, Vertical Pod Autoscaler adjusts resource requests per pod, and Cluster Autoscaler adds or removes nodes. Combined, they deliver automated cost optimization by matching compute capacity to actual demand.
Horizontal Pod Autoscaler (HPA)
Scales the number of pods up and down based on CPU or memory usage:
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: app-hpa
namespace: production
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: app-with-limits
minReplicas: 2
maxReplicas: 10
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70
- type: Resource
resource:
name: memory
target:
type: Utilization
averageUtilization: 80
Vertical Pod Autoscaler (VPA)
Automatically adjusts CPU and memory requests based on actual usage — removes the guesswork:
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
name: app-vpa
namespace: production
spec:
targetRef:
apiVersion: apps/v1
kind: Deployment
name: app-with-limits
updatePolicy:
updateMode: "Auto"
resourcePolicy:
containerPolicies:
- containerName: app
minAllowed:
cpu: 100m
memory: 128Mi
maxAllowed:
cpu: 1
memory: 1Gi
Cluster Autoscaler
Automatically adds nodes when pods cannot be scheduled and removes underutilized nodes:
# AWS EKS example — add to your node group configuration
apiVersion: apps/v1
kind: Deployment
metadata:
name: cluster-autoscaler
namespace: kube-system
spec:
template:
spec:
containers:
- name: cluster-autoscaler
command:
- ./cluster-autoscaler
- --cloud-provider=aws
- --namespace=kube-system
- --nodes=1:10:my-node-group
- --scale-down-utilization-threshold=0.5
- --scale-down-delay-after-add=10m
- --scale-down-unneeded-time=10m
Expected savings: 15–30% through proper autoscaling tuning.
Strategy 3 — Use Spot Instances for Non-Critical Workloads
Spot instances on AWS, preemptible VMs on GCP, and spot VMs on Azure offer 60–90% discounts over on-demand pricing. For fault-tolerant Kubernetes workloads, they are the single biggest cost lever available.
Good candidates for spot instances are stateless web servers, CI/CD runners, batch processing, and dev/staging environments. Bad candidates are databases, stateful services, and single-replica critical workloads. The best practice is to run a mix of on-demand for critical workloads and spot for everything else in the same cluster using node affinity rules.
Setting Up Spot Node Pools With Node Affinity
# Deployment targeting spot nodes for non-critical workloads
apiVersion: apps/v1
kind: Deployment
metadata:
name: batch-processor
namespace: production
spec:
replicas: 5
template:
spec:
affinity:
nodeAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 100
preference:
matchExpressions:
- key: node.kubernetes.io/lifecycle
operator: In
values:
- spot
tolerations:
- key: "spot"
operator: "Equal"
value: "true"
effect: "NoSchedule"
containers:
- name: batch-processor
image: my-batch-processor:latest
resources:
requests:
cpu: "500m"
memory: "512Mi"
# Critical workload — stays on on-demand nodes
apiVersion: apps/v1
kind: Deployment
metadata:
name: payment-service
namespace: production
spec:
replicas: 3
template:
spec:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: node.kubernetes.io/lifecycle
operator: In
values:
- on-demand
containers:
- name: payment-service
image: payment-service:latest
Expected savings: 60–90% on eligible workloads.
Strategy 4 — Enforce ResourceQuotas and LimitRanges per Namespace
Without guardrails, a single developer can accidentally deploy a workload that consumes all cluster resources. ResourceQuotas and LimitRanges prevent this.
ResourceQuota — Cap Total Namespace Usage
apiVersion: v1
kind: ResourceQuota
metadata:
name: production-quota
namespace: production
spec:
hard:
requests.cpu: "20"
requests.memory: 40Gi
limits.cpu: "40"
limits.memory: 80Gi
pods: "50"
persistentvolumeclaims: "10"
LimitRange — Set Default Requests and Limits
apiVersion: v1
kind: LimitRange
metadata:
name: default-limits
namespace: production
spec:
limits:
- type: Container
default:
cpu: "500m"
memory: "256Mi"
defaultRequest:
cpu: "100m"
memory: "128Mi"
max:
cpu: "2"
memory: "2Gi"
min:
cpu: "50m"
memory: "64Mi"
Together, ResourceQuotas and LimitRanges establish guardrails that enforce fair resource sharing and cost containment as teams scale their workloads. Regular review ensures these policies keep pace with evolving business and technical requirements.
Strategy 5 — Shut Down Non-Production Environments After Hours
Dev, staging, and QA clusters that run 24/7 but are only used during business hours — roughly 10 hours a day, 5 days a week — waste 70% of their compute cost.
A simple CronJob that scales down non-production deployments overnight:
# Scale down staging at 8PM every weekday
apiVersion: batch/v1
kind: CronJob
metadata:
name: scale-down-staging
namespace: staging
spec:
schedule: "0 20 * * 1-5"
jobTemplate:
spec:
template:
spec:
serviceAccountName: scaler
containers:
- name: kubectl
image: bitnami/kubectl:latest
command:
- /bin/sh
- -c
- |
kubectl scale deployment --all --replicas=0 -n staging
kubectl scale deployment --all --replicas=0 -n dev
restartPolicy: OnFailure
---
# Scale back up at 8AM every weekday
apiVersion: batch/v1
kind: CronJob
metadata:
name: scale-up-staging
namespace: staging
spec:
schedule: "0 8 * * 1-5"
jobTemplate:
spec:
template:
spec:
serviceAccountName: scaler
containers:
- name: kubectl
image: bitnami/kubectl:latest
command:
- /bin/sh
- -c
- |
kubectl scale deployment --all --replicas=2 -n staging
kubectl scale deployment --all --replicas=1 -n dev
restartPolicy: OnFailure
Expected savings: A company saved $10,000 per month by shutting down staging clusters after hours.
Strategy 6 — Reduce Cross-Zone Network Costs
Network egress charges are Kubernetes’ hidden cost multiplier. Pods communicating across availability zones or regions generate per-GB fees that compound at scale.
Topology-Aware Routing
Keep traffic within the same availability zone when possible:
apiVersion: v1
kind: Service
metadata:
name: backend-service
namespace: production
annotations:
service.kubernetes.io/topology-mode: "Auto"
spec:
selector:
app: backend
ports:
- port: 80
targetPort: 8080
Topology Spread Constraints
Distribute pods across zones for availability while keeping related pods close:
apiVersion: apps/v1
kind: Deployment
metadata:
name: backend
namespace: production
spec:
replicas: 6
template:
spec:
topologySpreadConstraints:
- maxSkew: 1
topologyKey: topology.kubernetes.io/zone
whenUnsatisfiable: DoNotSchedule
labelSelector:
matchLabels:
app: backend
containers:
- name: backend
image: backend:latest
Practical steps to reduce network costs include keeping frequently communicating pods in the same availability zone using topology-aware scheduling, setting up VPC endpoints for AWS services so pods do not route through the public internet, and considering single-zone namespace deployments where multi-cloud redundancy is not required.
Expected savings: 10–25% on networking costs for data-heavy workloads.
Strategy 7 — Add Cost Visibility With Labels and Monitoring
Cost visibility breaks down when it lives only in finance dashboards or raw cloud billing exports. You need cost mapped to namespaces, deployments, and labels that match ownership. A practical rule: if an engineer cannot answer “what does this deployment cost per day?” in under 60 seconds, you are guessing.
Enforce Label Standards at Admission Time
apiVersion: v1
kind: Namespace
metadata:
name: production
labels:
team: backend
env: prod
cost-center: engineering
owner: platform-team
# Kyverno policy — enforce required labels on all deployments
apiVersion: kyverno.io/v1
kind: ClusterPolicy
metadata:
name: require-cost-labels
spec:
validationFailureAction: enforce
rules:
- name: check-required-labels
match:
resources:
kinds:
- Deployment
validate:
message: "Deployment must have team, env, and cost-center labels"
pattern:
metadata:
labels:
team: "?*"
env: "?*"
cost-center: "?*"
Free Cost Monitoring Tools
Kubecost (free tier):
helm repo add kubecost https://kubecost.github.io/cost-analyzer/
helm install kubecost kubecost/cost-analyzer \
--namespace kubecost \
--create-namespace \
--set kubecostToken="your-token"
# Access the dashboard
kubectl port-forward -n kubecost svc/kubecost-cost-analyzer 9090
OpenCost (100% open source):
helm repo add opencost https://opencost.github.io/opencost-helm-chart
helm install opencost opencost/opencost \
--namespace opencost \
--create-namespace
Many teams discover 30–40% waste just by turning on cost monitoring for the first time.
Cost Optimization by Cloud Provider
AWS EKS
- Use Savings Plans or Reserved Instances for baseline capacity — saves 30–40%
- Enable Karpenter instead of Cluster Autoscaler — more aggressive scale-down
- Use Graviton (ARM) nodes — 20% cheaper than x86 equivalents
Google GKE
- Enable GKE Autopilot — Google right-sizes nodes automatically
- Use Committed Use Discounts for predictable workloads
- Enable Spot Pods on Autopilot for batch workloads
Azure AKS
- Use Azure Spot Node Pools for dev/test workloads
- Enable Azure Reserved VM Instances for baseline
- Use KEDA (event-driven autoscaling) for queue-based workloads
Cost Optimization Checklist
✅ Resource requests and limits set on every container
✅ HPA configured for all stateless workloads
✅ VPA running in recommendation mode (at minimum)
✅ Cluster Autoscaler or Karpenter enabled
✅ Spot instances used for dev/staging/batch workloads
✅ Non-production environments scaled to zero after hours
✅ ResourceQuotas applied to every namespace
✅ LimitRanges set with sensible defaults
✅ Topology-aware routing enabled for high-traffic services
✅ Cost labels enforced on all deployments
✅ Kubecost or OpenCost installed for visibility
✅ Monthly cost review process in place
✅ Reserved instances or savings plans for baseline nodes
✅ Unused PersistentVolumes audited and deleted
✅ Orphaned load balancers identified and removed
Frequently Asked Questions
How much can I realistically save? Organizations that implement these best practices cut Kubernetes costs by 30–50% without performance trade-offs. Teams focused purely on spot instances and rightsizing can see 40–70% reductions on eligible workloads.
Where should I start? Start with cost visibility — install Kubecost or OpenCost first. You cannot optimize what you cannot measure. Then fix resource requests and limits. That alone typically reveals 20–40% waste.
Does autoscaling always save money? The teams getting the best results review their scaling parameters monthly, not quarterly. Misconfigured autoscalers can keep too many nodes running. Always set aggressive scale-down thresholds and monitor the results.
Is VPA safe to use in production? VPA in Auto mode restarts pods to apply new resource values — this can cause brief disruptions. Start with updateMode: "Off" to get recommendations only, then switch to Initial mode which only applies values at pod creation before using Auto.
What is the minimum cluster size where cost optimization matters? The inflection point is usually around $10,000–$20,000 monthly spend, which typically corresponds to 20–50 nodes or 100–200 pods. Below this threshold, engineering time costs more than potential savings.
Should I use Karpenter or Cluster Autoscaler? For AWS EKS, Karpenter is now the recommended choice — it provisions nodes faster, supports more instance types, and scales down more aggressively. For GKE and AKS, use the native Cluster Autoscaler.
Official Resources
- Kubernetes Resource Management Documentation
- Horizontal Pod Autoscaler Documentation
- Vertical Pod Autoscaler GitHub
- Kubecost Open Source
- OpenCost Project