You've shipped a Worker, you've wired KV, you've got JWT sessions. The site loads in 200 ms and you're feeling good. Then you launch, the traffic graph goes up, and the Workers / D1 cost graph goes up with it — linearly. The same request is hitting your code and your database every single time.

This is the caching problem, and it's the single biggest difference between a "demo that works" and "a site that scales for $5/month." There are four caches between your user and your origin, and most apps either ignore them all or get exactly one of them right. By the end of this chapter you'll know which cache is doing what, the exact Cache-Control header to send for each kind of content, and you'll have a copy-pasteable Worker that fronts D1 with KV and never hits the database for a hot read.

The Four Caches, From Closest to Furthest

The user clicks a link, and the bytes for that response can come from up to four places before they ever reach your Worker:

Loading diagram…

Figure 1 — A response served from any of the green/yellow boxes never touches D1. The art of caching is pushing every read as far left as it can safely go, because:

Cache	Latency	Cost	What it stores
Browser	~0 ms (local disk/RAM)	$0	Static assets the browser already downloaded
Cloudflare CDN	~10 ms (nearest PoP)	$0	Anything with a cacheable `Cache-Control` + status 200
Workers Cache API	~10 ms (PoP-local, scoped by URL)	$0	Whatever your Worker explicitly `put()`s
KV	~30 ms (cold read, faster when hot)	~$0.50/M reads (very cheap)	Anything you serialise (D1 results, computed JSON, sessions)
D1 origin	~5–50 ms (the actual query)	per row read	The source of truth

The further right you go, the slower and more expensive a read becomes. Every cache hit at level n prevents work at levels n+1 through origin.

Cache 1: The Browser Cache — Cache-Control Done Right

The browser cache is the cheapest cache in existence (it's the user's own disk) and the most under-used. You activate it with one HTTP response header.

Cache-Control: public, max-age=31536000, immutable

That's the canonical "this asset is content-addressed and will never change" header — for fingerprinted JS / CSS / images (/_next/static/abc123.js). The user downloads it once, ever.

The directives you actually use:

Directive	What it does
`public`	Any cache (browser + CDN) may store this. Default-ish.
`private`	Only the user's browser may store this. CDN must not. Use for per-user responses.
`max-age=N`	Browser may use the cached response for N seconds without revalidating.
`s-maxage=N`	Same, but specifically for "shared" caches (the CDN). Overrides `max-age` at the edge.
`immutable`	"Don't even bother revalidating until `max-age` expires." Skips the conditional-GET round trip.
`no-cache`	Confusing name. Means "cache it, but always revalidate before using" — i.e. send a conditional GET every time.
`no-store`	"Don't cache at all." For genuinely sensitive responses.
`stale-while-revalidate=N`	"After `max-age` expires, you can still serve the stale version for up to N more seconds while fetching a fresh one in the background." Excellent UX.

A pragmatic policy by content type:

Content	Recommended `Cache-Control`
Fingerprinted asset (`app.abc123.js`)	`public, max-age=31536000, immutable`
Logo / favicon (changes rarely)	`public, max-age=86400`
Article HTML (changes occasionally)	`public, max-age=300, stale-while-revalidate=86400`
User dashboard (per-user, changes constantly)	`private, no-cache`
Payment confirmation page	`private, no-store`
API JSON (idempotent GET, public)	`public, max-age=60`

ETag + 304: The Free Conditional GET

For things that might have changed, send an ETag (a content hash). The browser will send the ETag back in If-None-Match on the next request, and if nothing changed you reply 304 Not Modified with zero body. The browser uses its cached copy. You saved transferring the bytes; the user saved the bandwidth.

// In a Worker
const body = await fetchArticleHtml(slug);
const etag = '"' + sha1(body).slice(0, 16) + '"';
 
if (request.headers.get("If-None-Match") === etag) {
  return new Response(null, { status: 304, headers: { ETag: etag } });
}
 
return new Response(body, {
  headers: {
    "Content-Type": "text/html",
    "Cache-Control": "public, max-age=300, stale-while-revalidate=86400",
    "ETag": etag,
  },
});

That's 10 lines and it eliminates most of your repeat-visitor bandwidth.

Cache 2: Cloudflare's CDN Cache (Automatic)

Cloudflare's edge automatically caches responses that have a cacheable status code (200, 301, 404, etc.) AND a cacheable Cache-Control (anything that isn't private / no-store / max-age=0). You don't have to do anything for it — your Cache-Control headers from the previous section are read at the PoP and the response is held there for the next visitor in the same region.

Two things worth knowing:

s-maxage overrides max-age at the edge. If you want a long edge TTL but a short browser TTL, send both: Cache-Control: public, max-age=60, s-maxage=86400. The edge holds it for a day; the browser revalidates every minute.
Cache by URL, not body. Cloudflare keys by (URL, method, request headers in the cache-key). If your Worker returns different responses for the same URL based on a cookie, the CDN will happily serve the wrong cached one. Set Cache-Control: private or vary the cache key.

Cache 3: The Workers Cache API (Programmatic)

The Cache API gives your Worker direct, programmatic access to the CDN cache at its own PoP. Two methods do everything:

export default {
  async fetch(req, env, ctx) {
    const cache = caches.default;
 
    // 1. Try the cache first.
    let response = await cache.match(req);
    if (response) return response;
 
    // 2. Cache miss — do the real work.
    response = await renderArticle(req, env);
 
    // 3. Store it for next time. Don't await — ship the response first.
    ctx.waitUntil(cache.put(req, response.clone()));
    return response;
  },
};

caches.default is the same physical cache as Cache 2 (the CDN cache), just exposed as a programmable thing. Two real upgrades it gives you:

Cache anything you compute — not just origin responses. Render some HTML in your Worker, stick it in the cache, serve every subsequent visitor in that PoP from the cache for $0.
Cache POST responses, vary by custom key, set custom TTLs — none of which the automatic CDN cache lets you do.

The pattern in the snippet (waitUntil) is critical: it returns the response immediately and writes to the cache in the background, so the first user pays the latency only once and not for the cache write.

Cache 4: KV (and Other App-Level Caches)

The previous three caches all live at the PoP and are scoped by URL. When you want to cache something that's shared across URLs — a D1 query result, a per-user permission set, a parsed config — you reach for KV.

The pattern is "read-through with TTL":

async function getCachedArticle(slug, env) {
  // 1. Try KV first.
  const cached = await env.KV.get(`article:${slug}`, "json");
  if (cached) return cached;
 
  // 2. Miss — query D1.
  const row = await env.DB.prepare(
    "SELECT title, body, updated_at FROM articles WHERE slug = ?"
  ).bind(slug).first();
  if (!row) return null;
 
  // 3. Write to KV with a TTL. Don't await — let it happen in the background.
  await env.KV.put(`article:${slug}`, JSON.stringify(row), {
    expirationTtl: 60, // seconds
  });
  return row;
}

KV's free tier (100k reads/day) and its ~$0.50 / million reads beyond that mean a hot read on this path costs you essentially nothing. The D1 read is reserved for the first hit per minute per article.

For the deep details on KV — when it loses to D1, the eventual-consistency gotcha for counters, list pagination — see Cloudflare Ch 4 — KV — The Edge Key-Value Store.

The Decision Table

When you have a thing you want to cache, ask "which cache?" — answer with this table:

What you want to cache	Use
Static JS/CSS/images, fingerprinted	Cache 1+2 with `public, max-age=31536000, immutable`
Article HTML (mostly static)	Cache 1+2 with `public, max-age=60, stale-while-revalidate=86400` + ETag
Worker-rendered HTML (per page)	Cache 3 (Workers Cache API) with `ctx.waitUntil`
D1 query result reused across URLs	Cache 4 (KV read-through with TTL)
Per-user response (dashboard)	Don't cache at CDN. `Cache-Control: private` + JWT in Worker.
"Is this user subscribed?" check	KV with short TTL (5–60 s), or DO for strict consistency
Anything sensitive (auth tokens, PII)	`Cache-Control: private, no-store` — never cache

Cache Invalidation — The Other Hard Problem

"There are only two hard things in Computer Science: cache invalidation and naming things." — Phil Karlton. Three strategies that actually work:

Time-based (TTL). Just let it expire. Use this when "up to N seconds stale" is acceptable. 80% of caching needs this only.
Versioned URLs. When the content changes, the URL changes — app.abc123.js becomes app.def456.js. The old URL stays cached forever but is never requested again. This is why fingerprinted assets exist.
Explicit purge. On a write, delete the cache entries that would now be stale. KV has delete(); the CDN supports purge-by-URL or purge-by-tag (paid plans). Use sparingly — it's easy to forget a path and serve stale data.

A good rule of thumb: TTL by default, version URLs when you can, purge only when neither works.

Putting It Together: A Real Cached-Article Worker

The smallest Worker that uses all four caches correctly:

export default {
  async fetch(req, env, ctx) {
    if (req.method !== "GET") return new Response("Method not allowed", { status: 405 });
 
    const url = new URL(req.url);
    const slug = url.pathname.replace(/^\/articles\//, "");
    if (!slug) return new Response("Not found", { status: 404 });
 
    // Cache 3: Workers Cache API (per-URL, per-PoP).
    const cache = caches.default;
    const hit = await cache.match(req);
    if (hit) return hit;
 
    // Cache 4: KV-backed D1 read-through.
    const article = await getCachedArticle(slug, env);
    if (!article) return new Response("Not found", { status: 404 });
 
    // ETag for Cache 1 (browser conditional GET).
    const etag = '"' + (await sha1Hex(article.body)).slice(0, 16) + '"';
    if (req.headers.get("If-None-Match") === etag) {
      return new Response(null, { status: 304, headers: { ETag: etag } });
    }
 
    const html = renderArticleHtml(article);
    const response = new Response(html, {
      headers: {
        "Content-Type": "text/html; charset=utf-8",
        "Cache-Control": "public, max-age=60, stale-while-revalidate=86400",
        "ETag": etag,
        "Vary": "Accept-Encoding",
      },
    });
 
    ctx.waitUntil(cache.put(req, response.clone())); // Cache 3 write, in the background
    return response;
  },
};

Every layer pulls its weight: browser (304s + max-age), CDN (60s edge cache + 24h stale-while-revalidate), Workers Cache API (per-PoP hot reads), and KV (D1 fan-out reduction). The same 100k-visitor day that would have done 100k D1 reads now does about 1.

Mental Model — Three Sentences

There are four caches between the user and your origin — browser, Cloudflare CDN, Workers Cache API, and KV/app-level — and the job of "doing caching" is choosing which layer answers each request as early as possible.
Browser + CDN are configured by Cache-Control + ETag headers (no code needed); Workers Cache API and KV are configured by code (programmatic put() / get() with TTLs).
TTL by default, version URLs when you can, purge only when neither works — and remember KV is eventually consistent (~60 s cross-PoP), so don't cache anything safety-critical there.

Try It Yourself (15 Minutes)

Add Cache-Control: public, max-age=31536000, immutable to one fingerprinted asset in your app. Reload in DevTools → Network and confirm subsequent loads show (memory cache) or (disk cache), not a network request.
Add an ETag to one HTML response. Refresh; confirm the second request returns 304 Not Modified with a near-empty body in DevTools.
Write the Workers Cache API snippet into a Worker. Hit the URL twice — confirm the second hit is faster and your Worker's console.log only fires once.
Wire a KV read-through in front of any D1 query. Watch the D1 read count in the dashboard flatten while traffic keeps climbing.
Pick one piece of content and decide which cache it should live in using the decision table. Justify it in one sentence.

Where This Lands in the Series

Your reads are now cheap. The next thing that breaks at scale is writes — specifically, abusive write patterns that bypass caches entirely and try to drown your origin.

Next chapter: Rate Limiting & Abuse Prevention — Cloudflare's built-in Rate Limiting Rules vs. a DIY Durable-Objects limiter, IP-based vs. JWT-based limits, the right way to send a 429 Too Many Requests, and how to stop a runaway script from turning your $0/mo Cloudflare bill into $5,000.

📚 Go deeper with LIPAI WANG’s hands-on Udemy bootcampsBrowse all courses →

← Series Overview Ch 2: Rate Limiting & Abuse Prevention→

WebUltimate Web Development SeriesWeb development tutorials for HTML, CSS, JavaScript, Next.js, Workers, databases, and production shipping.CloudflareCloudflare Feature FocusFocused Cloudflare tutorials for Workers, R2, Stream, Durable Objects, and edge deployment.DeliveryModern Delivery PipelineCI/CD, review, runner, and deploy workflows for teams shipping apps and websites safely.

Ship your apps faster

When you're ready to publish your Swift app to the App Store, Simple App Shipper handles metadata, screenshots, TestFlight, and submissions — all in one place.

Try Simple App Shipper

Caching, Properly — The Four Caches Your App Uses and How They Interact