Chapter 1 made your reads cheap. The next thing that breaks at scale isn't the read path — it's a single script someone wrote in five minutes that hits your POST /api/render endpoint 600 times a second. That's not a "lots of users" problem; that's an abuse problem, and the answer is rate limiting.
This chapter is the production rate-limiter handbook. By the end you'll know the four algorithms, the three Cloudflare ways to implement them (no-code, programmatic, and DIY), what to identify users by, how to respond properly with 429 Too Many Requests + Retry-After, and you'll have a Durable-Object-backed limiter you can drop into a real Worker.
Why You Have to Rate Limit Before You Launch
A small list of things that will happen on the public internet the first week your project is up:
- A bot finds your endpoint and starts a brute-force loop.
- A bored teenager writes
for i in range(100000): requests.post(...)and walks away. - A misconfigured test harness in someone's repo hits you 50× a second.
- A genuine user clicks "Submit" 12 times because the page is slow.
- A search-engine crawler decides your
/adminis interesting.
Without a rate limit, every one of those is a problem — anything from "free Cloudflare tier bills you $200" to "D1 row-write quota exhausted and real users see errors." With a sensible rate limit, all of them get cleanly bounced with a 429 and your real users never notice.
The mental shift: rate limiting isn't "for hostile users." It's the default state of every public endpoint. Add it before you ship the endpoint, not after the bill arrives.
The Four Algorithms
Every rate limiter is one of these four. The differences are real — pick the wrong one and you'll either let through obvious bursts or block legitimate users.
Figure 1 — Increasing sophistication left to right. Fixed-window is what Cloudflare's Rate Limiting Rules use under the hood; token bucket is the right default for a hand-rolled limiter.
| Algorithm | How it works | Best for |
|---|---|---|
| Fixed window | Count requests within a calendar window (00:00–00:01). Reset to 0 at boundary. | Simple no-code limits like CF Rate Limiting Rules. |
| Sliding window | Count requests in any rolling 60-second window. Smoother than fixed. | Anywhere you want fewer false rejects at minute boundaries. |
| Token bucket | A bucket of N tokens, refills at R/sec. Each request takes 1 token. Empty bucket → reject. | User-facing APIs — allows legitimate bursts while keeping average rate bounded. |
| Leaky bucket | Requests queue up; leak out at fixed R/sec. Overflow = reject. | Smoothing requests to a downstream rate-limited API (Stripe, OpenAI, etc.). |
For user-facing endpoints, token bucket is the right default — it's polite (lets a real user mash a button without instantly blocking them) and predictable (the average rate is still bounded). For abuse prevention at the edge, the cruder fixed-window approach is plenty and trivially cheap.
Cloudflare Gives You Three Knobs
You don't have to write this from scratch on Cloudflare. There are three increasingly-programmable options.
Option 1: Rate Limiting Rules (no code)
Cloudflare's WAF includes rate limiting at the plan level. Go to Security → WAF → Rate limiting rules, click Create rule, fill in a form. No Worker code touched.
| Plan | Rate-limiting rules included | Max window |
|---|---|---|
| Free | 1 rule | 10 seconds |
| Pro ($20/mo) | 2 rules | 1 hour |
| Business / Enterprise | more, with advanced expressions | 1 day |
This is the best choice for "blanket protect every endpoint from obvious abuse" — set a single rule like "if any client IP makes more than 60 requests per 10 seconds, block for 10 minutes" and you've stopped 95% of scripted abuse. It runs at the edge before your Worker even sees the request, so it costs you nothing per blocked request.
The free tier's single rule with a 10-second window is real, usable protection. Use it.
Option 2: Workers Rate Limiting API (programmatic, edge-replicated)
Cloudflare provides a Workers-native rate limiting binding that lets your Worker call into the same edge infrastructure. The shape:
# wrangler.toml
[[unsafe.bindings]]
name = "RL"
type = "ratelimit"
namespace_id = "1001"
simple = { limit = 100, period = 60 }const { success } = await env.RL.limit({ key: clientId });
if (!success) return new Response("Too Many Requests", { status: 429 });This gives you per-key (per-IP, per-user, per-API-key) limits without standing up your own state. The accounting is eventually consistent across PoPs (good enough for "stop the obvious script"), and you can wire key to whatever identity makes sense — usually a JWT subject (see web Ch 11).
Option 3: Durable Objects (DIY, strongly consistent)
When you need strict consistency — e.g., "this paid API call costs a credit, and the user has 100 credits, and they must not be able to race two requests through" — you reach for Durable Objects. A DO is a single-instance actor with transactional storage; all reads and writes for a given key serialise through it. That's perfect for a real token bucket.
A complete DO-based token-bucket limiter, in 30 lines:
// src/limiter.js
export class TokenBucket {
constructor(state, env) { this.state = state; }
async fetch(req) {
const { capacity, refillPerSec } = await req.json(); // e.g. {100, 10}
const now = Math.floor(Date.now() / 1000);
const stored = (await this.state.storage.get("b")) || { tokens: capacity, at: now };
// Refill since last check.
const elapsed = Math.max(0, now - stored.at);
const tokens = Math.min(capacity, stored.tokens + elapsed * refillPerSec);
if (tokens < 1) {
const wait = Math.ceil((1 - tokens) / refillPerSec);
return new Response(JSON.stringify({ ok: false, retryAfter: wait }), { status: 429 });
}
// Consume 1 token and persist.
await this.state.storage.put("b", { tokens: tokens - 1, at: now });
return new Response(JSON.stringify({ ok: true, remaining: Math.floor(tokens - 1) }));
}
}And the call site from a regular Worker:
// src/index.js
export default {
async fetch(req, env) {
const userId = await verifyJWT(req, env); // see web Ch 11
if (!userId) return new Response("Sign in", { status: 401 });
// One DO per user — naturally per-user-scoped.
const id = env.LIMITER.idFromName(`u:${userId}`);
const limiter = env.LIMITER.get(id);
const result = await limiter
.fetch("https://x/limit", { method: "POST", body: JSON.stringify({ capacity: 60, refillPerSec: 1 }) })
.then(r => r.json());
if (!result.ok) {
return new Response("Rate limited", {
status: 429,
headers: { "Retry-After": String(result.retryAfter) },
});
}
return realHandler(req, env);
},
};That's a per-user, transactional, strongly-consistent token bucket — and it costs you a single DO request per user request.
Identifying Who to Limit — Pick One of Three
Rate limits are only as good as the identity they're keyed on. The three real choices:
| Key by | Where to get it | Pros | Cons |
|---|---|---|---|
| IP address | request.headers.get("CF-Connecting-IP") | Works for anonymous endpoints; no auth needed. | Mobile IPs cycle; many users share one corporate/NAT IP; IPv6 makes per-/64 the right unit. |
| JWT user ID | Decode the session cookie (web Ch 11) | Per-real-user, immune to NAT. | Only works on authenticated routes. |
| API key | Header like Authorization: Bearer … | Per-customer for B2B APIs. | Same per-customer key may be used by many of their services. |
For most real apps the right answer is layered: rate-limit by IP at the edge (Option 1) to stop crude abuse cheap, then by JWT user ID inside the Worker (Option 2 or 3) for per-account fairness. Different routes get different limits.
Per-Route Tiers — The 90% Rule
A single global rate limit ("60 req/min/user") is wrong for any real app. The right shape is routes that have different abuse profiles get different limits:
| Route | Suggested limit per identity | Why |
|---|---|---|
POST /auth/login | 5 / 5 min | Brute-force protection. Real users only retry a few times. |
POST /auth/forgot-password | 3 / hour | Email bombing / enumeration. |
GET /api/articles/* | 120 / min | Read-only, mostly cacheable (Ch 1) anyway. |
POST /api/comments | 10 / min | Spam protection without blocking conversation. |
POST /api/render (expensive) | 10 / hour | Each call costs you CPU/$$. |
| Webhook receivers | None on the receiver — verify HMAC instead. | Sender retries are a feature; see Ch 3. |
Login is the canonical "ridiculously strict" route. Everything else flows from how expensive a misuse is.
Responding Properly: 429 + Retry-After
When you reject, send a real HTTP response, not a 500 or a blank page. The semantic right answer is 429 Too Many Requests with a Retry-After header in seconds:
HTTP/1.1 429 Too Many Requests
Content-Type: application/json
Retry-After: 47
{"error":"rate_limited","retry_after":47,"limit":"60/min"}Real clients (the JS SDK they're using, fetch with retry, Stripe's own webhook retry mechanism) read Retry-After and back off cleanly. Bots that ignore it just keep slamming into the wall, which is also fine — your origin doesn't care, the 429 is essentially free to produce.
Also: don't silently 200. Returning success while internally dropping the work is the worst-of-both-worlds — the client thinks it succeeded and never retries.
Cost Protection — The Sibling of Rate Limiting
A rate limit caps requests per identity. A spending limit caps your overall bill. Both belong in this chapter because they solve the same end problem: "stop runaway cost."
The Workers Paid plan ($5/month) lets you set a hard request limit beyond which your Worker just stops serving (returns an error). Set it. Same on the Stripe dashboard (set a daily transaction cap in test mode before you've fully verified your integration). KV, R2, D1 also expose per-resource limits.
For the full take on this — including the GitHub Actions/Copilot version — see web Ch 21. The principle is identical: usage-based services need usage-based caps, and the time to set them is before you launch.
Mental Model — Three Sentences
- Rate limiting is the default state of every public endpoint — add it before you ship, not after the bill arrives.
- Pick a Cloudflare option matched to your need: WAF Rate Limiting Rules (free, no-code, edge-fast) for blanket abuse defence; the Workers
ratelimitbinding for per-key in-Worker checks; Durable Objects when you need strict consistency (token-bucket per paid user). - Identity matters as much as the algorithm — key by
CF-Connecting-IPfor anonymous routes, by JWT subject for authenticated ones, by API key for B2B — and always answer denials with429 + Retry-Afterso real clients back off cleanly.
Try It Yourself (15 Minutes)
- Add a Rate Limiting Rule in the Cloudflare dashboard: any IP making >60 requests in 10 seconds → block for 10 minutes. Visit your site in a loop with
for i in {1..100}; do curl -s https://yoursite.com/ > /dev/null; doneand watch yourself get blocked. - Add the Workers
ratelimitbinding to a test Worker and limitPOST /testto 5 / minute by IP. Confirm the 6th call within a minute returns 429. - Build the Durable-Object token-bucket limiter from this chapter. Set
capacity=10, refillPerSec=1and call it in a loop — observe the first 10 succeed instantly, then 1/sec sustained. - On your real app, identify the single most expensive endpoint and add a stricter limit just for it (1 per minute per JWT, say). That one change is usually worth more than every other limit combined.
- Open Workers → Settings → Usage Notifications and set a hard request limit. You now have both per-request and total-spend protection.
Where This Lands in the Series
Reads are cheap (Ch 1), writes are rate-limited (this chapter). The next thing real apps screw up is inter-service trust — specifically webhooks: the inbound POSTs from Stripe, GitHub, your CI system, that you have to verify came from who they claim to and process exactly once even when the sender retries.
Next chapter: Webhooks That Don't Lose Data — HMAC signature verification (and the raw-body trap that breaks it), idempotency keys, the right way to handle retries, and dead-letter handling for the inevitable bad payloads.
Ship your apps faster
When you're ready to publish your Swift app to the App Store, Simple App Shipper handles metadata, screenshots, TestFlight, and submissions — all in one place.
Try Simple App Shipper