API Rate Limiting Complete Guide 2026: Algorithms, Implementation & 429 Handling
Every production API needs rate limiting. Without it, a single misbehaving client — a polling loop gone wrong, a DDoS attack, or a well-intentioned but aggressive integration — can take down your service for every other user. Rate limiting is both a safety mechanism and a fairness policy. This guide covers all four rate limiting algorithms (token bucket, leaky bucket, fixed window, sliding window), a production-ready Redis implementation in Node.js, the standard rate limit response headers every API should send, how to correctly handle 429 errors as an API consumer with exponential backoff, and the design decisions that separate good rate limiting from frustrating rate limiting.
4
rate limiting algorithms — each with different burst behavior and accuracy trade-offs
429
HTTP status code for rate limit exceeded — always return JSON with Retry-After
<1ms
Redis sliding window check latency — rate limiting should never add perceptible overhead
3x
typical cost reduction when rate limiting prevents runaway client polling loops
Definition: What Is API Rate Limiting?
Rate limiting = controlling how many requests a client can make in a time window
API rate limiting is the practice of restricting how many requests a client (identified by IP address, API key, user ID, or tenant) can make within a defined time window. When a client exceeds the limit, the server returns HTTP 429 Too Many Requests instead of processing the request. Rate limiting protects your infrastructure from overload, enforces fair usage across clients, enables usage-based billing tiers, and provides a first line of defense against DDoS attacks and abusive bots.
When You Need Rate Limiting — and What to Limit
DDoS and abuse protection
A single malicious or malfunctioning client can send thousands of requests per second. Without rate limiting, this exhausts your connection pool, database connections, and CPU — causing a full service outage for all clients. Rate limiting at the IP level caps damage from any single source.
Fair usage across tenants
In a multi-tenant API, one high-traffic tenant can consume 95% of capacity and starve all other tenants. Per-tenant rate limits ensure every customer gets a predictable, fair share of capacity regardless of what others are doing.
Usage-based billing enforcement
Pricing tiers (Free: 1,000 req/day, Pro: 100,000 req/day, Enterprise: unlimited) are enforced by rate limiting. The rate limiter checks the client's plan and applies the corresponding limit, returning 429 with an upgrade prompt when exceeded.
Downstream API cost control
If your API proxies a paid upstream service (OpenAI, Stripe, SendGrid), each request costs you money. Rate limiting your clients prevents unexpected cost spikes from a single integration bug that sends 10,000 requests in a minute.
Bot and scraper mitigation
Aggressive scrapers can exhaust your API capacity and increase infrastructure costs significantly. Combined with bot detection, rate limiting slows scrapers enough to make them economically unviable while barely affecting legitimate users.
Protecting expensive endpoints specifically
Not all endpoints cost the same. A /search endpoint that runs a full-text query is 100x more expensive than /ping. Apply tighter limits to expensive endpoints independently — 10 search requests per second, 1000 health checks per second — rather than one global limit for everything.
How — The 4 Rate Limiting Algorithms Explained
| Algorithm | How It Works | Burst Handling | Accuracy | Best For |
|---|---|---|---|---|
| Fixed Window | Count requests in fixed calendar windows (e.g., 0:00–0:59, 1:00–1:59). Reset counter at window boundary. | ❌ "Window boundary burst" — 2x limit possible across boundary | ⚠️ Low — boundary artifacts | Simple quotas, daily limits, billing |
| Sliding Window | Count requests in a rolling window relative to now (last 60 seconds, not this clock minute). No boundary. | ✅ Smooth — no boundary burst | ✅ High — accurate at all times | API rate limits, most production use cases |
| Token Bucket | Bucket holds N tokens. Each request consumes 1. Tokens refill at a fixed rate. Burst up to bucket capacity. | ✅ Allows controlled burst up to capacity | ✅ High — smooth refill | APIs with legitimate burst traffic (uploads, batch) |
| Leaky Bucket | Requests enter a queue (the bucket). Queue drains at a fixed rate. Requests that overflow the queue are dropped. | ❌ No burst — strictly constant rate | ✅ Perfectly smooth output | Upstream systems that cannot handle bursts |
Sliding Window Rate Limit — Request Flow
Request arrives
Client sends request. Identify by: API key, user_id, IP, or tenant_id. Build Redis key.
Redis pipeline executes
Atomic: ZREMRANGEBYSCORE (remove old entries) → ZADD (record this request) → ZCARD (count in window) → EXPIRE (set TTL)
Count check
If count <= limit: allow request, set X-RateLimit-Remaining header. If count > limit: return 429 immediately.
200 or 429 response
200: process normally with rate limit headers. 429: JSON error body + Retry-After header + X-RateLimit-Reset.
Client handles response
Success: continue. 429: read Retry-After, wait the specified seconds, retry. Never retry immediately on 429.
How — Node.js + Redis Production Implementation
import { createClient } from 'redis';
const redis = createClient({ url: process.env.REDIS_URL });
await redis.connect();
// ── Sliding window rate limiter ────────────────────────────────────────────
async function slidingWindowLimit({ key, limit, windowSeconds }) {
const now = Date.now();
const windowStart = now - windowSeconds * 1000;
const requestId = now + '-' + Math.random().toString(36).slice(2);
// Atomic pipeline — all operations execute together, no race conditions
const [,, count] = await redis
.multi()
.zRemRangeByScore(key, 0, windowStart) // remove requests older than window
.zAdd(key, { score: now, value: requestId }) // record this request
.zCard(key) // count requests in window
.expire(key, windowSeconds) // auto-expire the key
.exec();
const remaining = Math.max(0, limit - count);
const resetAt = new Date(now + windowSeconds * 1000).toISOString();
return { allowed: count <= limit, count, remaining, limit, resetAt };
}
// ── Express middleware ─────────────────────────────────────────────────────
function createRateLimiter({ limit = 100, windowSeconds = 60, keyFn } = {}) {
return async (req, res, next) => {
// Default: rate limit by API key, fall back to IP
const identifier = keyFn
? keyFn(req)
: req.headers['x-api-key'] || req.ip;
const key = 'rl:' + identifier + ':' + Math.floor(Date.now() / (windowSeconds * 1000));
const result = await slidingWindowLimit({ key, limit, windowSeconds });
// Always set rate limit headers — even on success
res.set({
'X-RateLimit-Limit': limit,
'X-RateLimit-Remaining': result.remaining,
'X-RateLimit-Reset': result.resetAt,
'X-RateLimit-Policy': limit + ';w=' + windowSeconds,
});
if (!result.allowed) {
const retryAfter = Math.ceil(windowSeconds - (Date.now() % (windowSeconds * 1000)) / 1000);
res.set('Retry-After', retryAfter);
return res.status(429).json({
error: {
code: 'RATE_LIMIT_EXCEEDED',
message: 'Too many requests. Please slow down and retry after ' + retryAfter + ' seconds.',
retryAfter,
limit,
resetAt: result.resetAt,
},
});
}
next();
};
}
// ── Usage — apply globally and per-endpoint ────────────────────────────────
const globalLimit = createRateLimiter({ limit: 1000, windowSeconds: 60 });
const searchLimit = createRateLimiter({ limit: 10, windowSeconds: 60 }); // expensive endpoint
const uploadLimit = createRateLimiter({ limit: 5, windowSeconds: 3600 }); // per hour
app.use(globalLimit); // 1000 req/min for all endpoints
app.get('/api/search', searchLimit, handler); // tighter: 10 searches/min
app.post('/api/upload', uploadLimit, handler);// very tight: 5 uploads/hour// Token bucket — best for allowing controlled bursts
class TokenBucket {
constructor({ capacity, refillRate }) {
this.capacity = capacity; // max tokens (burst size)
this.refillRate = refillRate; // tokens added per second
this.tokens = capacity; // start full
this.lastRefill = Date.now();
}
consume(count = 1) {
this.#refill();
if (this.tokens >= count) {
this.tokens -= count;
return { allowed: true, remaining: Math.floor(this.tokens) };
}
return {
allowed: false,
remaining: 0,
waitMs: Math.ceil(((count - this.tokens) / this.refillRate) * 1000),
};
}
#refill() {
const now = Date.now();
const elapsed = (now - this.lastRefill) / 1000; // seconds
this.tokens = Math.min(this.capacity, this.tokens + elapsed * this.refillRate);
this.lastRefill = now;
}
}
// Example: 20 req burst allowed, refills at 5 req/sec (300 req/min steady state)
const bucket = new TokenBucket({ capacity: 20, refillRate: 5 });
// Middleware usage
function tokenBucketMiddleware(req, res, next) {
const result = bucket.consume(1);
if (!result.allowed) {
res.set('Retry-After', Math.ceil(result.waitMs / 1000));
return res.status(429).json({ error: { code: 'RATE_LIMIT_EXCEEDED', waitMs: result.waitMs } });
}
res.set('X-RateLimit-Remaining', result.remaining);
next();
}How — Standard Rate Limit Response Headers
// ── On every successful response (not just 429) ───────────────────────────
HTTP/1.1 200 OK
X-RateLimit-Limit: 100 // max requests allowed in the window
X-RateLimit-Remaining: 73 // requests remaining before limit is hit
X-RateLimit-Reset: 2026-05-15T10:01:00Z // ISO timestamp when window resets
X-RateLimit-Policy: 100;w=60 // IETF draft: 100 requests per 60 seconds
// ── On 429 Too Many Requests ───────────────────────────────────────────────
HTTP/1.1 429 Too Many Requests
Retry-After: 12 // seconds until client can retry (REQUIRED on 429)
X-RateLimit-Limit: 100
X-RateLimit-Remaining: 0
X-RateLimit-Reset: 2026-05-15T10:01:00Z
Content-Type: application/json
{
"error": {
"code": "RATE_LIMIT_EXCEEDED",
"message": "Rate limit of 100 requests per 60 seconds exceeded. Retry after 12 seconds.",
"retryAfter": 12,
"limit": 100,
"resetAt": "2026-05-15T10:01:00Z",
"docsUrl": "https://docs.myapi.com/rate-limits"
}
}
// ── Retry-After formats (HTTP spec accepts both) ────────────────────────────
Retry-After: 60 // integer seconds (simpler, recommended)
Retry-After: Wed, 15 May 2026 10:01:00 GMT // HTTP date formatHow — Handle 429 Correctly as an API Consumer
// ── Exponential backoff with jitter — the correct 429 handler ─────────────
async function fetchWithRetry(url, options = {}, maxRetries = 4) {
for (let attempt = 0; attempt <= maxRetries; attempt++) {
const res = await fetch(url, options);
// Success — return immediately
if (res.status !== 429) return res;
// Final attempt — throw instead of waiting pointlessly
if (attempt === maxRetries) {
throw new Error('Rate limit exceeded after ' + maxRetries + ' retries');
}
// Read Retry-After header (always prefer server-specified wait time)
const retryAfterHeader = res.headers.get('Retry-After');
let delayMs;
if (retryAfterHeader) {
delayMs = parseInt(retryAfterHeader, 10) * 1000;
} else {
// Exponential backoff: 1s, 2s, 4s, 8s ... capped at 60s
// Jitter: ±30% randomness prevents "thundering herd" — all clients retrying in sync
const base = Math.min(1000 * Math.pow(2, attempt), 60_000);
const jitter = base * 0.3 * (Math.random() * 2 - 1);
delayMs = Math.round(base + jitter);
}
console.warn('Rate limited. Waiting ' + delayMs + 'ms before retry ' + (attempt + 1) + ' of ' + maxRetries);
await new Promise((resolve) => setTimeout(resolve, delayMs));
}
}
// ── Queue-based rate limit aware client ────────────────────────────────────
class RateLimitedClient {
constructor({ requestsPerSecond = 10 }) {
this.minInterval = 1000 / requestsPerSecond;
this.lastRequestAt = 0;
this.queue = [];
this.processing = false;
}
async fetch(url, options) {
return new Promise((resolve, reject) => {
this.queue.push({ url, options, resolve, reject });
if (!this.processing) this.#processQueue();
});
}
async #processQueue() {
this.processing = true;
while (this.queue.length > 0) {
const wait = Math.max(0, this.minInterval - (Date.now() - this.lastRequestAt));
if (wait > 0) await new Promise(r => setTimeout(r, wait));
const { url, options, resolve, reject } = this.queue.shift();
this.lastRequestAt = Date.now();
fetchWithRetry(url, options).then(resolve).catch(reject);
}
this.processing = false;
}
}
// Usage:
const client = new RateLimitedClient({ requestsPerSecond: 5 }); // 5 req/s max
const res = await client.fetch('https://api.example.com/data');Why Rate Limiting Design Matters — Fairness and UX
Always send Retry-After — it is required, not optional
RFC 6585 requires Retry-After on 429 responses. Without it, well-behaved clients cannot know when to retry and will either give up or implement their own backoff — often incorrectly. A missing Retry-After header is an API bug, not a client problem.
Always send rate limit headers on success — not just on 429
Clients should be able to see "I have 23 requests remaining" before hitting the limit. Headers on successful responses let clients implement proactive throttling — slowing down when remaining approaches zero rather than hitting 429 and disrupting the user experience.
Rate limit by the right identifier for the context
IP-based limits protect against unauthenticated abuse. User-ID limits enforce per-user fairness. API key limits map to billing tiers. Tenant limits protect multi-tenant capacity. In practice, most APIs layer all three: IP limits before auth, then user/key limits after auth.
Make your rate limit headers part of your API documentation
Clearly document: what the limits are, what headers you return, what the 429 response looks like, and how to implement backoff. Undocumented rate limits surprise and frustrate integrators. Documented limits with sensible values and clear 429 responses are just good API design.
Use Redis — do not implement rate limiting in process memory for multi-instance deployments
In-memory rate limiters only work for single-server deployments. With three instances behind a load balancer, each instance tracks its own count — effective limit becomes 3x the intended limit. Redis provides a shared, atomic counter accessible from all instances simultaneously.
4 rate limiting mistakes that hurt legitimate users
Some rate limiters return plain text or malformed JSON on 429 — paste any broken error response into our AI Error Explainer to diagnose the syntax issue instantly.
Debug 429 JSON Response →