Kubernetes Resource Requests and Limits

Most Kubernetes problems in production come down to one thing — resource misconfiguration. Pods getting OOMKilled. Applications throttled under load. Nodes running out of memory and taking down everything on them.

Resource requests and limits are how you prevent this. They tell Kubernetes how much CPU and memory each container needs, and how much it is allowed to use. Get this right and your cluster runs predictably. Get it wrong and you spend time firefighting incidents that should never have happened.

This guide covers everything — what requests and limits are, how they behave differently for CPU and memory, common mistakes, and how to set the right values for your workloads.

What Are Resource Requests?

A resource request is the minimum amount of CPU or memory Kubernetes guarantees to a container.

The scheduler uses requests to decide which node to place a pod on. It looks at every node and finds one that has enough unallocated resources to satisfy the pod’s request. If no node can satisfy the request, the pod stays in Pending status.

resources:
  requests:
    cpu: "250m"       # 0.25 of a CPU core
    memory: "128Mi"   # 128 mebibytes

Once scheduled, the node reserves that amount for the container. The container is guaranteed to get at least that much. If the node has spare capacity, the container can use more — but it is never guaranteed more than what it requested.

What Are Resource Limits?

A resource limit is the maximum amount of CPU or memory a container is allowed to use.

resources:
  requests:
    cpu: "250m"
    memory: "128Mi"
  limits:
    cpu: "500m"
    memory: "256Mi"

If a container tries to exceed its limit, two things can happen — and they are different for CPU and memory.

CPU vs Memory — Very Different Behaviour

This is the most important thing to understand about limits.

CPU is compressible.

If a container hits its CPU limit, the kernel throttles it. The container slows down. It does not crash. It does not restart. It just runs slower until CPU is available.

This can silently destroy your application’s performance. A latency-sensitive API hitting its CPU limit will have slower response times without any obvious error in your logs.

Memory is not compressible.

If a container exceeds its memory limit, the kernel kills it immediately with an OOM (Out of Memory) kill. The pod restarts. You see OOMKilled in the pod status.

# Check if a pod was OOMKilled
kubectl describe pod &lt;pod-name> | grep -A 5 "Last State"

Output:

Last State:     Terminated
  Reason:       OOMKilled
  Exit Code:    137

Exit code 137 means OOM kill. Your container tried to use more memory than its limit and was terminated.

CPU Units — Millicores

CPU is measured in millicores (m). 1000m = 1 full CPU core.

1000m = 1 core
500m  = 0.5 core
250m  = 0.25 core
100m  = 0.1 core

You can also write whole numbers:

cpu: "1"    # 1 full core
cpu: "0.5"  # 0.5 core (same as 500m)
cpu: "250m" # 0.25 core

Memory Units

Memory is measured in bytes. Use these suffixes:

Ki = kibibyte  (1Ki = 1024 bytes)
Mi = mebibyte  (1Mi = 1024 Ki)
Gi = gibibyte  (1Gi = 1024 Mi)

Examples:

memory: "128Mi"   # 128 mebibytes
memory: "1Gi"     # 1 gibibyte
memory: "512Mi"   # 512 mebibytes

A Complete Example

Here is a realistic deployment with requests and limits set:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: api-server
  namespace: production
spec:
  replicas: 3
  selector:
    matchLabels:
      app: api-server
  template:
    metadata:
      labels:
        app: api-server
    spec:
      containers:
      - name: api
        image: myapp/api:v2.1.0
        ports:
        - containerPort: 8080
        resources:
          requests:
            cpu: "250m"
            memory: "256Mi"
          limits:
            cpu: "500m"
            memory: "512Mi"

This container is guaranteed 0.25 CPU cores and 256Mi memory. It can burst up to 0.5 CPU cores and 512Mi memory. If it tries to use more than 512Mi, it gets OOMKilled.

Quality of Service (QoS) Classes

Kubernetes assigns every pod a QoS class based on how requests and limits are set. This determines which pods get evicted first when a node runs low on resources.

Guaranteed (highest priority — never evicted first)

Requests and limits are set AND they are equal for all containers:

resources:
  requests:
    cpu: "500m"
    memory: "256Mi"
  limits:
    cpu: "500m"
    memory: "256Mi"

Use this for critical production workloads that must never be evicted.

Burstable (medium priority)

Requests are set but limits are higher than requests, or only requests are set:

resources:
  requests:
    cpu: "250m"
    memory: "128Mi"
  limits:
    cpu: "500m"
    memory: "256Mi"

This is the most common setting. Pods can burst above their request when capacity is available, but may be evicted before Guaranteed pods if the node runs low.

BestEffort (lowest priority — evicted first)

No requests or limits set at all:

# No resources block
containers:
- name: worker
  image: myapp:latest

These pods get whatever is left over on the node. They are the first to be evicted when resources are tight. Never use BestEffort for production workloads.

Common Mistakes

Mistake 1 — Setting no requests or limits

Without requests, the scheduler cannot make good decisions. Pods end up on overloaded nodes. Without memory limits, one container with a memory leak can consume all node memory and cause everything on that node to crash.

Mistake 2 — Setting CPU limits too low

CPU throttling is silent. Your app slows down under load, latency goes up, and nothing in your logs explains why. CPU limits too low is the most common cause of unexplained latency spikes.

Symptom: Check CPU throttling on your containers:

# Install metrics-server first, then
kubectl top pods -n production

Or check with cAdvisor metrics in Prometheus:

container_cpu_cfs_throttled_seconds_total

If this number is climbing, your CPU limit is too low.

Mistake 3 — Setting memory requests too low

If your memory request is lower than actual usage, the scheduler puts more pods on a node than it can handle. When memory runs out, pods start getting evicted.

Mistake 4 — Setting limits without requests

If you set limits but not requests, Kubernetes sets the request equal to the limit automatically. This makes your pod Guaranteed class, which sounds good — but it means you are reserving the full limit on every node even if the container rarely uses that much. This wastes capacity.

Mistake 5 — Copying limits from other teams

Every application is different. A Node.js app, a Java app, and a Python ML model have completely different resource profiles. Do not copy-paste resource values — measure your actual usage.

How to Find the Right Values

Do not guess. Measure.

Step 1 — Deploy without limits first (in staging)

Run your application under realistic load with no CPU limits (but with memory limits for safety). Let it use what it needs.

Step 2 — Check actual usage

# Check current pod resource usage
kubectl top pods -n production

# Check node usage
kubectl top nodes

Step 3 — Use Vertical Pod Autoscaler (VPA) in recommendation mode

VPA watches your pods and recommends optimal resource values based on actual usage. Set it to recommendation mode — it will not change anything, just tell you what it recommends.

apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
  name: api-server-vpa
  namespace: production
spec:
  targetRef:
    apiVersion: "apps/v1"
    kind: Deployment
    name: api-server
  updatePolicy:
    updateMode: "Off"  # Recommendation only — does not auto-apply

Check recommendations:

kubectl describe vpa api-server-vpa -n production

Output shows:

  Recommendation:
    Container Recommendations:
      Container Name:  api
      Lower Bound:
        Cpu:     100m
        Memory:  200Mi
      Target:
        Cpu:     250m
        Memory:  300Mi
      Upper Bound:
        Cpu:     500m
        Memory:  512Mi

Use the Target values as your requests. Use Upper Bound as your limits.

Step 4 — Set requests at the 90th percentile of actual usage

If your app normally uses 150m CPU but spikes to 400m during peak traffic, set:

Request: 150m (covers normal operation)
Limit: 500m (covers peak with headroom)

LimitRange — Setting Defaults for a Namespace

If a developer deploys a pod without requests or limits, it gets BestEffort class. You can prevent this by setting a LimitRange on the namespace. It automatically applies default requests and limits to any container that does not specify them.

apiVersion: v1
kind: LimitRange
metadata:
  name: default-limits
  namespace: production
spec:
  limits:
  - type: Container
    default:
      cpu: "500m"
      memory: "256Mi"
    defaultRequest:
      cpu: "100m"
      memory: "128Mi"
    max:
      cpu: "2"
      memory: "2Gi"
    min:
      cpu: "50m"
      memory: "64Mi"

Apply it:

kubectl apply -f limitrange.yaml

Now any container in the production namespace without resource settings automatically gets 100m CPU and 128Mi memory as requests, and 500m CPU and 256Mi memory as limits. Developers cannot set limits higher than 2 CPU or 2Gi memory.

ResourceQuota — Capping Total Usage per Namespace

LimitRange sets per-container defaults. ResourceQuota caps the total resources a namespace can consume across all pods.

apiVersion: v1
kind: ResourceQuota
metadata:
  name: production-quota
  namespace: production
spec:
  hard:
    requests.cpu: "10"
    requests.memory: "20Gi"
    limits.cpu: "20"
    limits.memory: "40Gi"
    pods: "50"

Apply it:

kubectl apply -f resourcequota.yaml

Check current usage against quota:

kubectl describe resourcequota production-quota -n production

Output:

Name:            production-quota
Namespace:       production
Resource         Used    Hard
--------         ----    ----
limits.cpu       4       20
limits.memory    8Gi     40Gi
pods             12      50
requests.cpu     2       10
requests.memory  4Gi     20Gi

This prevents any single team from consuming all cluster resources. Use ResourceQuota when multiple teams share a cluster.

Real-World Examples

Web API (Node.js):

resources:
  requests:
    cpu: "100m"
    memory: "128Mi"
  limits:
    cpu: "500m"
    memory: "256Mi"

Node.js apps have low CPU baseline but can spike. No hard CPU limit issues at this range.

Java Application:

resources:
  requests:
    cpu: "500m"
    memory: "512Mi"
  limits:
    cpu: "1000m"
    memory: "1Gi"

Java apps need more memory for the JVM. Set memory limits conservatively — Java GC can cause sudden memory spikes.

Python ML Inference:

resources:
  requests:
    cpu: "1000m"
    memory: "2Gi"
  limits:
    cpu: "2000m"
    memory: "4Gi"

ML inference loads large models into memory. Underestimating memory here guarantees OOMKills.

Background Worker / Batch Job:

resources:
  requests:
    cpu: "50m"
    memory: "64Mi"
  limits:
    memory: "256Mi"
    # No CPU limit — let it burst freely

Batch jobs do not have latency requirements. Skip the CPU limit so they finish faster when CPU is available.

Troubleshooting Resource Issues

Pod stuck in Pending:

kubectl describe pod &lt;pod-name> | grep -A 10 "Events:"

If you see Insufficient cpu or Insufficient memory, the request is too high for any available node. Either lower the request or add more nodes.

OOMKilled pods:

kubectl describe pod &lt;pod-name> | grep -A 5 "Last State"
# Look for Reason: OOMKilled

Increase memory limit. Check for memory leaks if it keeps growing.

Slow application performance — check CPU throttling:

# In Prometheus
container_cpu_cfs_throttled_periods_total / container_cpu_cfs_periods_total

If throttling ratio is above 25%, your CPU limit is too low. Either increase it or remove it.

Check all pods without resource requests:

kubectl get pods -A -o json | \
  jq '.items[] | select(.spec.containers[].resources.requests == null) | 
  .metadata.name + " in " + .metadata.namespace'

Summary — The Rules to Follow

Always set memory limits — a container without a memory limit can crash the entire node
Be careful with CPU limits — throttling is silent and kills performance. Start without CPU limits, measure, then set them
Set requests based on actual usage — use kubectl top and VPA recommendations, not guesses
Use LimitRange to enforce defaults in every namespace — prevent BestEffort pods from slipping through
Use ResourceQuota on shared clusters — prevent one team from consuming everything
Review resource settings quarterly — workloads change over time and old settings become wrong