Kubernetes Cost Control: Requests, Limits, and the Traps That Inflate Bills

Misconfigured resource requests are the top driver of Kubernetes overspend. How to right-size, autoscale, and allocate costs per namespace.

Kubernetes makes it easy to deploy workloads and remarkably easy to overpay for them. The abstraction that simplifies operations also hides the connection between what you request, what you use, and what you pay for. Most Kubernetes clusters run at 20–40% actual utilisation—the rest is allocated but idle capacity that appears on your cloud bill every month.

Requests vs. Limits: The Misunderstanding That Costs Money

Resource requests are what the scheduler uses to place pods on nodes. If you request 2 CPU and 4Gi memory, the scheduler reserves that capacity on a node—whether your pod uses it or not. Resource limits are the ceiling: the maximum your pod can burst to before it gets throttled (CPU) or killed (memory).

The common mistake is setting requests equal to limits. This eliminates burstability and forces the cluster to reserve peak capacity for every pod at all times. If your service uses 200m CPU at steady state and spikes to 1 CPU during deployments, a request of 1 CPU wastes 800m continuously. Multiply that across 50 services and you are paying for a cluster that is four times larger than it needs to be.

Right-Sizing Methodology

Use actual metrics, not guesswork. Pull P95 and P99 CPU and memory usage from Prometheus over a 14-day window that includes peak traffic. Set requests at P95 steady-state usage plus a 20% buffer. Set limits at P99 peak usage plus a 30% buffer. This gives pods room to burst without reserving capacity that will never be used.

# Right-sized resource spec based on observed metrics
resources:
  requests:
    cpu: 250m      # P95 steady-state: 200m + 20% buffer
    memory: 384Mi  # P95 steady-state: 320Mi + 20%
  limits:
    cpu: '1'       # P99 peak: 750m + 30% buffer
    memory: 512Mi

Kubernetes Cost Control: Requests, Limits, and the Traps That Inflate Bills

Requests vs. Limits: The Misunderstanding That Costs Money

Right-Sizing Methodology

AI Spend Is Making Cloud Waste Trend Up Again

Mastering AWS Costs: A CTO's Guide to FinOps

Lean Startup in the AI Age: What Still Works, What Breaks, What Replaces It

Autoscaler Configuration Traps

Namespace Cost Allocation

Decision Framework

Failure Modes