Kubernetes Cost Optimization: 7 Ways to Cut Your Cloud Bill

Running Kubernetes in production is powerful — but without cost controls in place, your cloud bill grows faster than your traffic does.

According to Datadog’s State of Cloud Costs report, 83% of container costs go to idle resources — split between overprovisioned Kubernetes cluster infrastructure at 54% and oversized workload requests at 29%. That means most of what you are paying for right now is compute your workloads never actually use.

The good news: with the right tools, policies, and practices, teams are cutting cloud spending by 30–50% without sacrificing performance or reliability.

This guide covers 7 proven strategies with real YAML examples you can apply to your cluster today.

Why Kubernetes Costs Spiral Out of Control

Before optimizing, you need to understand why costs grow unchecked in the first place.

The most common causes are overprovisioned resource requests where developers set high CPU and memory requests just in case and those resources are reserved whether used or not, no resource limits where pods without limits consume unbounded resources during spikes, idle namespaces where dev and staging environments run 24/7 when they are only used during business hours, oversized node pools where cluster autoscaler is configured too conservatively keeping excess nodes running, and no cost visibility where teams do not know what their workloads actually cost so they cannot optimize.

Recent industry data shows over 68% of organizations overspend on Kubernetes by 20–40% or more, often due to misconfigurations and lack of ongoing governance. In 2026, with AI workloads and larger clusters, these gaps are even more expensive.

Strategy 1 — Set Resource Requests and Limits on Every Pod

This is the single biggest cost lever available. Setting resource requests too high leads to wasted resources — nodes are reserved for workloads that rarely or never need that much capacity. Conversely, setting them too low risks application instability due to insufficient resources.

Without resource requests — dangerous and expensive:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: app-without-limits
spec:
  replicas: 3
  selector:
    matchLabels:
      app: sample
  template:
    metadata:
      labels:
        app: sample
    spec:
      containers:
      - name: app
        image: nginx
        # No requests or limits — Kubernetes has no idea
        # how to schedule this efficiently

With proper resource requests and limits:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: app-with-limits
spec:
  replicas: 3
  selector:
    matchLabels:
      app: sample
  template:
    metadata:
      labels:
        app: sample
    spec:
      containers:
      - name: app
        image: nginx
        resources:
          requests:
            cpu: "200m"
            memory: "256Mi"
          limits:
            cpu: "500m"
            memory: "512Mi"

How to find the right values:

Do not guess. Use actual usage data from Prometheus or kubectl:

# Check actual CPU and memory usage per pod
kubectl top pods -n production

# Check node utilization
kubectl top nodes

# Get detailed resource usage
kubectl describe node &lt;node-name> | grep -A 5 "Allocated resources"

Profile workload behavior under real production conditions, gather historical usage patterns, and assign values that closely mirror real need while leaving minimal headroom. Tools like the Vertical Pod Autoscaler or custom monitoring with Prometheus and Grafana can provide granular insights into how much CPU and memory containers actually consume.

Expected savings: 20–50% on compute costs.

Strategy 2 — Implement All Three Autoscalers

Kubernetes offers three autoscaling mechanisms that work together: Horizontal Pod Autoscaler scales pod count based on demand, Vertical Pod Autoscaler adjusts resource requests per pod, and Cluster Autoscaler adds or removes nodes. Combined, they deliver automated cost optimization by matching compute capacity to actual demand.

Horizontal Pod Autoscaler (HPA)

Scales the number of pods up and down based on CPU or memory usage:

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: app-hpa
  namespace: production
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: app-with-limits
  minReplicas: 2
  maxReplicas: 10
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 70
  - type: Resource
    resource:
      name: memory
      target:
        type: Utilization
        averageUtilization: 80

Vertical Pod Autoscaler (VPA)

Automatically adjusts CPU and memory requests based on actual usage — removes the guesswork:

apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
  name: app-vpa
  namespace: production
spec:
  targetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: app-with-limits
  updatePolicy:
    updateMode: "Auto"
  resourcePolicy:
    containerPolicies:
    - containerName: app
      minAllowed:
        cpu: 100m
        memory: 128Mi
      maxAllowed:
        cpu: 1
        memory: 1Gi

Cluster Autoscaler

Automatically adds nodes when pods cannot be scheduled and removes underutilized nodes:

# AWS EKS example — add to your node group configuration
apiVersion: apps/v1
kind: Deployment
metadata:
  name: cluster-autoscaler
  namespace: kube-system
spec:
  template:
    spec:
      containers:
      - name: cluster-autoscaler
        command:
        - ./cluster-autoscaler
        - --cloud-provider=aws
        - --namespace=kube-system
        - --nodes=1:10:my-node-group
        - --scale-down-utilization-threshold=0.5
        - --scale-down-delay-after-add=10m
        - --scale-down-unneeded-time=10m

Expected savings: 15–30% through proper autoscaling tuning.

Strategy 3 — Use Spot Instances for Non-Critical Workloads

Spot instances on AWS, preemptible VMs on GCP, and spot VMs on Azure offer 60–90% discounts over on-demand pricing. For fault-tolerant Kubernetes workloads, they are the single biggest cost lever available.

Good candidates for spot instances are stateless web servers, CI/CD runners, batch processing, and dev/staging environments. Bad candidates are databases, stateful services, and single-replica critical workloads. The best practice is to run a mix of on-demand for critical workloads and spot for everything else in the same cluster using node affinity rules.

Setting Up Spot Node Pools With Node Affinity

# Deployment targeting spot nodes for non-critical workloads
apiVersion: apps/v1
kind: Deployment
metadata:
  name: batch-processor
  namespace: production
spec:
  replicas: 5
  template:
    spec:
      affinity:
        nodeAffinity:
          preferredDuringSchedulingIgnoredDuringExecution:
          - weight: 100
            preference:
              matchExpressions:
              - key: node.kubernetes.io/lifecycle
                operator: In
                values:
                - spot
      tolerations:
      - key: "spot"
        operator: "Equal"
        value: "true"
        effect: "NoSchedule"
      containers:
      - name: batch-processor
        image: my-batch-processor:latest
        resources:
          requests:
            cpu: "500m"
            memory: "512Mi"

# Critical workload — stays on on-demand nodes
apiVersion: apps/v1
kind: Deployment
metadata:
  name: payment-service
  namespace: production
spec:
  replicas: 3
  template:
    spec:
      affinity:
        nodeAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            nodeSelectorTerms:
            - matchExpressions:
              - key: node.kubernetes.io/lifecycle
                operator: In
                values:
                - on-demand
      containers:
      - name: payment-service
        image: payment-service:latest

Expected savings: 60–90% on eligible workloads.

Strategy 4 — Enforce ResourceQuotas and LimitRanges per Namespace

Without guardrails, a single developer can accidentally deploy a workload that consumes all cluster resources. ResourceQuotas and LimitRanges prevent this.

ResourceQuota — Cap Total Namespace Usage

apiVersion: v1
kind: ResourceQuota
metadata:
  name: production-quota
  namespace: production
spec:
  hard:
    requests.cpu: "20"
    requests.memory: 40Gi
    limits.cpu: "40"
    limits.memory: 80Gi
    pods: "50"
    persistentvolumeclaims: "10"

LimitRange — Set Default Requests and Limits

apiVersion: v1
kind: LimitRange
metadata:
  name: default-limits
  namespace: production
spec:
  limits:
  - type: Container
    default:
      cpu: "500m"
      memory: "256Mi"
    defaultRequest:
      cpu: "100m"
      memory: "128Mi"
    max:
      cpu: "2"
      memory: "2Gi"
    min:
      cpu: "50m"
      memory: "64Mi"

Together, ResourceQuotas and LimitRanges establish guardrails that enforce fair resource sharing and cost containment as teams scale their workloads. Regular review ensures these policies keep pace with evolving business and technical requirements.

Strategy 5 — Shut Down Non-Production Environments After Hours

Dev, staging, and QA clusters that run 24/7 but are only used during business hours — roughly 10 hours a day, 5 days a week — waste 70% of their compute cost.

A simple CronJob that scales down non-production deployments overnight:

# Scale down staging at 8PM every weekday
apiVersion: batch/v1
kind: CronJob
metadata:
  name: scale-down-staging
  namespace: staging
spec:
  schedule: "0 20 * * 1-5"
  jobTemplate:
    spec:
      template:
        spec:
          serviceAccountName: scaler
          containers:
          - name: kubectl
            image: bitnami/kubectl:latest
            command:
            - /bin/sh
            - -c
            - |
              kubectl scale deployment --all --replicas=0 -n staging
              kubectl scale deployment --all --replicas=0 -n dev
          restartPolicy: OnFailure
---
# Scale back up at 8AM every weekday
apiVersion: batch/v1
kind: CronJob
metadata:
  name: scale-up-staging
  namespace: staging
spec:
  schedule: "0 8 * * 1-5"
  jobTemplate:
    spec:
      template:
        spec:
          serviceAccountName: scaler
          containers:
          - name: kubectl
            image: bitnami/kubectl:latest
            command:
            - /bin/sh
            - -c
            - |
              kubectl scale deployment --all --replicas=2 -n staging
              kubectl scale deployment --all --replicas=1 -n dev
          restartPolicy: OnFailure

Expected savings: A company saved $10,000 per month by shutting down staging clusters after hours.

Strategy 6 — Reduce Cross-Zone Network Costs

Network egress charges are Kubernetes’ hidden cost multiplier. Pods communicating across availability zones or regions generate per-GB fees that compound at scale.

Topology-Aware Routing

Keep traffic within the same availability zone when possible:

apiVersion: v1
kind: Service
metadata:
  name: backend-service
  namespace: production
  annotations:
    service.kubernetes.io/topology-mode: "Auto"
spec:
  selector:
    app: backend
  ports:
  - port: 80
    targetPort: 8080

Topology Spread Constraints

Distribute pods across zones for availability while keeping related pods close:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: backend
  namespace: production
spec:
  replicas: 6
  template:
    spec:
      topologySpreadConstraints:
      - maxSkew: 1
        topologyKey: topology.kubernetes.io/zone
        whenUnsatisfiable: DoNotSchedule
        labelSelector:
          matchLabels:
            app: backend
      containers:
      - name: backend
        image: backend:latest

Practical steps to reduce network costs include keeping frequently communicating pods in the same availability zone using topology-aware scheduling, setting up VPC endpoints for AWS services so pods do not route through the public internet, and considering single-zone namespace deployments where multi-cloud redundancy is not required.

Expected savings: 10–25% on networking costs for data-heavy workloads.

Strategy 7 — Add Cost Visibility With Labels and Monitoring

Cost visibility breaks down when it lives only in finance dashboards or raw cloud billing exports. You need cost mapped to namespaces, deployments, and labels that match ownership. A practical rule: if an engineer cannot answer “what does this deployment cost per day?” in under 60 seconds, you are guessing.

Enforce Label Standards at Admission Time

apiVersion: v1
kind: Namespace
metadata:
  name: production
  labels:
    team: backend
    env: prod
    cost-center: engineering
    owner: platform-team

# Kyverno policy — enforce required labels on all deployments
apiVersion: kyverno.io/v1
kind: ClusterPolicy
metadata:
  name: require-cost-labels
spec:
  validationFailureAction: enforce
  rules:
  - name: check-required-labels
    match:
      resources:
        kinds:
        - Deployment
    validate:
      message: "Deployment must have team, env, and cost-center labels"
      pattern:
        metadata:
          labels:
            team: "?*"
            env: "?*"
            cost-center: "?*"

Free Cost Monitoring Tools

Kubecost (free tier):

helm repo add kubecost https://kubecost.github.io/cost-analyzer/
helm install kubecost kubecost/cost-analyzer \
  --namespace kubecost \
  --create-namespace \
  --set kubecostToken="your-token"

# Access the dashboard
kubectl port-forward -n kubecost svc/kubecost-cost-analyzer 9090

OpenCost (100% open source):

helm repo add opencost https://opencost.github.io/opencost-helm-chart
helm install opencost opencost/opencost \
  --namespace opencost \
  --create-namespace

Many teams discover 30–40% waste just by turning on cost monitoring for the first time.

Cost Optimization by Cloud Provider

AWS EKS

Use Savings Plans or Reserved Instances for baseline capacity — saves 30–40%
Enable Karpenter instead of Cluster Autoscaler — more aggressive scale-down
Use Graviton (ARM) nodes — 20% cheaper than x86 equivalents

Google GKE

Enable GKE Autopilot — Google right-sizes nodes automatically
Use Committed Use Discounts for predictable workloads
Enable Spot Pods on Autopilot for batch workloads

Azure AKS

Use Azure Spot Node Pools for dev/test workloads
Enable Azure Reserved VM Instances for baseline
Use KEDA (event-driven autoscaling) for queue-based workloads

Cost Optimization Checklist

✅ Resource requests and limits set on every container
✅ HPA configured for all stateless workloads
✅ VPA running in recommendation mode (at minimum)
✅ Cluster Autoscaler or Karpenter enabled
✅ Spot instances used for dev/staging/batch workloads
✅ Non-production environments scaled to zero after hours
✅ ResourceQuotas applied to every namespace
✅ LimitRanges set with sensible defaults
✅ Topology-aware routing enabled for high-traffic services
✅ Cost labels enforced on all deployments
✅ Kubecost or OpenCost installed for visibility
✅ Monthly cost review process in place
✅ Reserved instances or savings plans for baseline nodes
✅ Unused PersistentVolumes audited and deleted
✅ Orphaned load balancers identified and removed

Frequently Asked Questions

How much can I realistically save? Organizations that implement these best practices cut Kubernetes costs by 30–50% without performance trade-offs. Teams focused purely on spot instances and rightsizing can see 40–70% reductions on eligible workloads.

Where should I start? Start with cost visibility — install Kubecost or OpenCost first. You cannot optimize what you cannot measure. Then fix resource requests and limits. That alone typically reveals 20–40% waste.

Does autoscaling always save money? The teams getting the best results review their scaling parameters monthly, not quarterly. Misconfigured autoscalers can keep too many nodes running. Always set aggressive scale-down thresholds and monitor the results.

Is VPA safe to use in production? VPA in Auto mode restarts pods to apply new resource values — this can cause brief disruptions. Start with updateMode: "Off" to get recommendations only, then switch to Initial mode which only applies values at pod creation before using Auto.

What is the minimum cluster size where cost optimization matters? The inflection point is usually around $10,000–$20,000 monthly spend, which typically corresponds to 20–50 nodes or 100–200 pods. Below this threshold, engineering time costs more than potential savings.

Should I use Karpenter or Cluster Autoscaler? For AWS EKS, Karpenter is now the recommended choice — it provisions nodes faster, supports more instance types, and scales down more aggressively. For GKE and AKS, use the native Cluster Autoscaler.

DevToolHub