You've shipped a Worker, you've wired KV, you've got JWT sessions. The site loads in 200 ms and you're feeling good. Then you launch, the traffic graph goes up, and the Workers / D1 cost graph goes up with it — linearly. The same request is hitting your code and your database every single time.
This is the caching problem, and it's the single biggest difference between a "demo that works" and "a site that scales for $5/month." There are four caches between your user and your origin, and most apps either ignore them all or get exactly one of them right. By the end of this chapter you'll know which cache is doing what, the exact Cache-Control header to send for each kind of content, and you'll have a copy-pasteable Worker that fronts D1 with KV and never hits the database for a hot read.
The Four Caches, From Closest to Furthest
The user clicks a link, and the bytes for that response can come from up to four places before they ever reach your Worker:
Figure 1 — A response served from any of the green/yellow boxes never touches D1. The art of caching is pushing every read as far left as it can safely go, because:
| Cache | Latency | Cost | What it stores |
|---|---|---|---|
| Browser | ~0 ms (local disk/RAM) | $0 | Static assets the browser already downloaded |
| Cloudflare CDN | ~10 ms (nearest PoP) | $0 | Anything with a cacheable Cache-Control + status 200 |
| Workers Cache API | ~10 ms (PoP-local, scoped by URL) | $0 | Whatever your Worker explicitly put()s |
| KV | ~30 ms (cold read, faster when hot) | ~$0.50/M reads (very cheap) | Anything you serialise (D1 results, computed JSON, sessions) |
| D1 origin | ~5–50 ms (the actual query) | per row read | The source of truth |
The further right you go, the slower and more expensive a read becomes. Every cache hit at level n prevents work at levels n+1 through origin.
Cache 1: The Browser Cache — Cache-Control Done Right
The browser cache is the cheapest cache in existence (it's the user's own disk) and the most under-used. You activate it with one HTTP response header.
Cache-Control: public, max-age=31536000, immutableThat's the canonical "this asset is content-addressed and will never change" header — for fingerprinted JS / CSS / images (/_next/static/abc123.js). The user downloads it once, ever.
The directives you actually use:
| Directive | What it does |
|---|---|
public | Any cache (browser + CDN) may store this. Default-ish. |
private | Only the user's browser may store this. CDN must not. Use for per-user responses. |
max-age=N | Browser may use the cached response for N seconds without revalidating. |
s-maxage=N | Same, but specifically for "shared" caches (the CDN). Overrides max-age at the edge. |
immutable | "Don't even bother revalidating until max-age expires." Skips the conditional-GET round trip. |
no-cache | Confusing name. Means "cache it, but always revalidate before using" — i.e. send a conditional GET every time. |
no-store | "Don't cache at all." For genuinely sensitive responses. |
stale-while-revalidate=N | "After max-age expires, you can still serve the stale version for up to N more seconds while fetching a fresh one in the background." Excellent UX. |
A pragmatic policy by content type:
| Content | Recommended Cache-Control |
|---|---|
Fingerprinted asset (app.abc123.js) | public, max-age=31536000, immutable |
| Logo / favicon (changes rarely) | public, max-age=86400 |
| Article HTML (changes occasionally) | public, max-age=300, stale-while-revalidate=86400 |
| User dashboard (per-user, changes constantly) | private, no-cache |
| Payment confirmation page | private, no-store |
| API JSON (idempotent GET, public) | public, max-age=60 |
ETag + 304: The Free Conditional GET
For things that might have changed, send an ETag (a content hash). The browser will send the ETag back in If-None-Match on the next request, and if nothing changed you reply 304 Not Modified with zero body. The browser uses its cached copy. You saved transferring the bytes; the user saved the bandwidth.
// In a Worker
const body = await fetchArticleHtml(slug);
const etag = '"' + sha1(body).slice(0, 16) + '"';
if (request.headers.get("If-None-Match") === etag) {
return new Response(null, { status: 304, headers: { ETag: etag } });
}
return new Response(body, {
headers: {
"Content-Type": "text/html",
"Cache-Control": "public, max-age=300, stale-while-revalidate=86400",
"ETag": etag,
},
});That's 10 lines and it eliminates most of your repeat-visitor bandwidth.
Cache 2: Cloudflare's CDN Cache (Automatic)
Cloudflare's edge automatically caches responses that have a cacheable status code (200, 301, 404, etc.) AND a cacheable Cache-Control (anything that isn't private / no-store / max-age=0). You don't have to do anything for it — your Cache-Control headers from the previous section are read at the PoP and the response is held there for the next visitor in the same region.
Two things worth knowing:
s-maxageoverridesmax-ageat the edge. If you want a long edge TTL but a short browser TTL, send both:Cache-Control: public, max-age=60, s-maxage=86400. The edge holds it for a day; the browser revalidates every minute.- Cache by URL, not body. Cloudflare keys by
(URL, method, request headers in the cache-key). If your Worker returns different responses for the same URL based on a cookie, the CDN will happily serve the wrong cached one. SetCache-Control: privateor vary the cache key.
Cache 3: The Workers Cache API (Programmatic)
The Cache API gives your Worker direct, programmatic access to the CDN cache at its own PoP. Two methods do everything:
export default {
async fetch(req, env, ctx) {
const cache = caches.default;
// 1. Try the cache first.
let response = await cache.match(req);
if (response) return response;
// 2. Cache miss — do the real work.
response = await renderArticle(req, env);
// 3. Store it for next time. Don't await — ship the response first.
ctx.waitUntil(cache.put(req, response.clone()));
return response;
},
};caches.default is the same physical cache as Cache 2 (the CDN cache), just exposed as a programmable thing. Two real upgrades it gives you:
- Cache anything you compute — not just origin responses. Render some HTML in your Worker, stick it in the cache, serve every subsequent visitor in that PoP from the cache for $0.
- Cache POST responses, vary by custom key, set custom TTLs — none of which the automatic CDN cache lets you do.
The pattern in the snippet (waitUntil) is critical: it returns the response immediately and writes to the cache in the background, so the first user pays the latency only once and not for the cache write.
Cache 4: KV (and Other App-Level Caches)
The previous three caches all live at the PoP and are scoped by URL. When you want to cache something that's shared across URLs — a D1 query result, a per-user permission set, a parsed config — you reach for KV.
The pattern is "read-through with TTL":
async function getCachedArticle(slug, env) {
// 1. Try KV first.
const cached = await env.KV.get(`article:${slug}`, "json");
if (cached) return cached;
// 2. Miss — query D1.
const row = await env.DB.prepare(
"SELECT title, body, updated_at FROM articles WHERE slug = ?"
).bind(slug).first();
if (!row) return null;
// 3. Write to KV with a TTL. Don't await — let it happen in the background.
await env.KV.put(`article:${slug}`, JSON.stringify(row), {
expirationTtl: 60, // seconds
});
return row;
}KV's free tier (100k reads/day) and its ~$0.50 / million reads beyond that mean a hot read on this path costs you essentially nothing. The D1 read is reserved for the first hit per minute per article.
For the deep details on KV — when it loses to D1, the eventual-consistency gotcha for counters, list pagination — see Cloudflare Ch 4 — KV — The Edge Key-Value Store.
The Decision Table
When you have a thing you want to cache, ask "which cache?" — answer with this table:
| What you want to cache | Use |
|---|---|
| Static JS/CSS/images, fingerprinted | Cache 1+2 with public, max-age=31536000, immutable |
| Article HTML (mostly static) | Cache 1+2 with public, max-age=60, stale-while-revalidate=86400 + ETag |
| Worker-rendered HTML (per page) | Cache 3 (Workers Cache API) with ctx.waitUntil |
| D1 query result reused across URLs | Cache 4 (KV read-through with TTL) |
| Per-user response (dashboard) | Don't cache at CDN. Cache-Control: private + JWT in Worker. |
| "Is this user subscribed?" check | KV with short TTL (5–60 s), or DO for strict consistency |
| Anything sensitive (auth tokens, PII) | Cache-Control: private, no-store — never cache |
Cache Invalidation — The Other Hard Problem
"There are only two hard things in Computer Science: cache invalidation and naming things." — Phil Karlton. Three strategies that actually work:
- Time-based (TTL). Just let it expire. Use this when "up to N seconds stale" is acceptable. 80% of caching needs this only.
- Versioned URLs. When the content changes, the URL changes —
app.abc123.jsbecomesapp.def456.js. The old URL stays cached forever but is never requested again. This is why fingerprinted assets exist. - Explicit purge. On a write, delete the cache entries that would now be stale. KV has
delete(); the CDN supports purge-by-URL or purge-by-tag (paid plans). Use sparingly — it's easy to forget a path and serve stale data.
A good rule of thumb: TTL by default, version URLs when you can, purge only when neither works.
Putting It Together: A Real Cached-Article Worker
The smallest Worker that uses all four caches correctly:
export default {
async fetch(req, env, ctx) {
if (req.method !== "GET") return new Response("Method not allowed", { status: 405 });
const url = new URL(req.url);
const slug = url.pathname.replace(/^\/articles\//, "");
if (!slug) return new Response("Not found", { status: 404 });
// Cache 3: Workers Cache API (per-URL, per-PoP).
const cache = caches.default;
const hit = await cache.match(req);
if (hit) return hit;
// Cache 4: KV-backed D1 read-through.
const article = await getCachedArticle(slug, env);
if (!article) return new Response("Not found", { status: 404 });
// ETag for Cache 1 (browser conditional GET).
const etag = '"' + (await sha1Hex(article.body)).slice(0, 16) + '"';
if (req.headers.get("If-None-Match") === etag) {
return new Response(null, { status: 304, headers: { ETag: etag } });
}
const html = renderArticleHtml(article);
const response = new Response(html, {
headers: {
"Content-Type": "text/html; charset=utf-8",
"Cache-Control": "public, max-age=60, stale-while-revalidate=86400",
"ETag": etag,
"Vary": "Accept-Encoding",
},
});
ctx.waitUntil(cache.put(req, response.clone())); // Cache 3 write, in the background
return response;
},
};Every layer pulls its weight: browser (304s + max-age), CDN (60s edge cache + 24h stale-while-revalidate), Workers Cache API (per-PoP hot reads), and KV (D1 fan-out reduction). The same 100k-visitor day that would have done 100k D1 reads now does about 1.
Mental Model — Three Sentences
- There are four caches between the user and your origin — browser, Cloudflare CDN, Workers Cache API, and KV/app-level — and the job of "doing caching" is choosing which layer answers each request as early as possible.
- Browser + CDN are configured by
Cache-Control+ETagheaders (no code needed); Workers Cache API and KV are configured by code (programmaticput()/get()with TTLs). - TTL by default, version URLs when you can, purge only when neither works — and remember KV is eventually consistent (~60 s cross-PoP), so don't cache anything safety-critical there.
Try It Yourself (15 Minutes)
- Add
Cache-Control: public, max-age=31536000, immutableto one fingerprinted asset in your app. Reload in DevTools → Network and confirm subsequent loads show(memory cache)or(disk cache), not a network request. - Add an
ETagto one HTML response. Refresh; confirm the second request returns304 Not Modifiedwith a near-empty body in DevTools. - Write the Workers Cache API snippet into a Worker. Hit the URL twice — confirm the second hit is faster and your Worker's
console.logonly fires once. - Wire a KV read-through in front of any D1 query. Watch the D1 read count in the dashboard flatten while traffic keeps climbing.
- Pick one piece of content and decide which cache it should live in using the decision table. Justify it in one sentence.
Where This Lands in the Series
Your reads are now cheap. The next thing that breaks at scale is writes — specifically, abusive write patterns that bypass caches entirely and try to drown your origin.
Next chapter: Rate Limiting & Abuse Prevention — Cloudflare's built-in Rate Limiting Rules vs. a DIY Durable-Objects limiter, IP-based vs. JWT-based limits, the right way to send a 429 Too Many Requests, and how to stop a runaway script from turning your $0/mo Cloudflare bill into $5,000.
Ship your apps faster
When you're ready to publish your Swift app to the App Store, Simple App Shipper handles metadata, screenshots, TestFlight, and submissions — all in one place.
Try Simple App Shipper