KV is the storage primitive most often used wrong. People reach for it because the API is one line — await env.KV.get('key') — and then ship something that quietly breaks because they didn't read the small print on consistency. This chapter is the small print, written with concrete code and concrete failure modes.

Workers KV is a globally-replicated, read-optimised key-value store. The model is write to a central origin → fan-out to every edge PoP. Reads are blisteringly fast (sub-10ms in cache, ~50ms on a miss). Writes are slow to converge — up to 60 seconds before every PoP sees the new value. That trade-off, and only that trade-off, is what KV is.

When KV is the right answer

Use KV when the value:

Is read far more than written. Hot reads at the edge, written occasionally from the origin.
Doesn't need to be the same in every region within seconds. A stale value for 30 seconds is fine.
Is small. Up to 25 MiB per value technically, but the sweet spot is < 100 KB.

Concrete examples that fit:

Feature flags — flip in the dashboard, accept a minute of rollout drift.
Public site configuration — homepage hero copy, supported-locale list, version-pinned manifests.
Public-key directories — JWT verification keys, OAuth client metadata.
Edge-cached lookup tables — IP → country, slug → product ID, anything where the source of truth is elsewhere and KV is the fast read.
Read-through caches in front of D1 or an external API — wrap an expensive query, write the result with a TTL.

When KV is exactly the wrong answer

Counters / rate-limits / quotas. Two writers in two regions both reading 5, both writing 6, last-write-wins → off by one. Use D1 (UPDATE ... SET count = count + 1) or a Durable Object.
Anything that must be the same in every region right now. Auth tokens you just minted, session state that needs to be valid the moment after sign-in, the cart you just added an item to. Use Durable Objects, or read from D1's primary, or use a signed token that doesn't need a server lookup at all.
Anything where two writers might disagree. KV has no compare-and-swap, no native transactions, no atomic increment. Last write wins, no negotiation.
High-write workloads. KV's free tier is 1,000 writes/day. Paid tier writes are $5 per million. D1 writes are $1 per million. If you write a lot, D1 or DOs are cheaper and correct.

The "rate limit by IP in KV" anti-pattern is the canonical mistake. The bug appears under load, the metric looks plausible, and the actual count is wrong by 20–50%. Don't do it.

The binding

[[kv_namespaces]]
binding = "CACHE"
id = "..."

Just like R2 and D1, the runtime hands you env.CACHE with get, put, delete, and list. No SDK to import, no credentials to manage.

Read operations

// Default — string
const raw = await env.CACHE.get('hero-copy');
 
// JSON — KV does the parse for you
const config = await env.CACHE.get('site-config', 'json');
 
// ArrayBuffer / Stream variants for binary
const bytes = await env.CACHE.get('blob-key', 'arrayBuffer');
const stream = await env.CACHE.get('blob-key', 'stream');
 
// Get with metadata
const { value, metadata } = await env.CACHE.getWithMetadata('hero-copy');

Three properties of reads that matter:

First read in a PoP is a cache miss — slower (~50ms+). Subsequent reads in the same PoP are fast (~5–10ms).
Reads are always served from the PoP-local replica — strongly consistent with that replica, eventually consistent with the origin.
A null return is indistinguishable from a deleted key. If you need "deleted" semantics, encode them in the value ({ status: 'tombstoned' }).

Write operations

// Plain string
await env.CACHE.put('hero-copy', 'Ship apps in a weekend.');
 
// With TTL — value disappears after N seconds
await env.CACHE.put('rate-attempt:' + ip, '1', { expirationTtl: 60 });
 
// With absolute expiration (unix seconds)
await env.CACHE.put('promo-banner', 'Memorial Day sale', { expiration: 1748908800 });
 
// With metadata (returned alongside the value on getWithMetadata)
await env.CACHE.put('user:' + uid, JSON.stringify(profile), {
  metadata: { tier: 'pro', updated: Date.now() },
  expirationTtl: 3600,
});

Four properties of writes:

put returns as soon as the write hits the origin. It does not wait for global fan-out. Subsequent reads in remote PoPs may return the old value for up to 60 seconds.
expirationTtl is minimum 60 seconds. Don't try to use KV for sub-minute caches.
No atomic increment. get → +1 → put between two Workers races; the loser silently overwrites.
Write rate-limit: ~1 write/key/sec. Don't hammer the same key.

The TTL trick — and the 60-second floor

expirationTtl is the most-used KV option for a reason: it gives you a free garbage collector. The minimum TTL is 60 seconds, so the smallest sensible KV cache is "this entry lives for at least a minute." If you want a 5-second cache, KV is not the right tool — you want the Cache API or an in-memory Map inside a Durable Object.

A practical pattern: read-through cache in front of an expensive D1 query.

async function getProductSummary(env, slug) {
  const cached = await env.CACHE.get('product:' + slug, 'json');
  if (cached) return cached;
 
  const row = await env.DB.prepare(
    'SELECT id, name, price, hero_url FROM products WHERE slug = ?'
  ).bind(slug).first();
  if (!row) return null;
 
  await env.CACHE.put('product:' + slug, JSON.stringify(row), {
    expirationTtl: 300, // 5 minutes
  });
  return row;
}

This is the "right" use of KV: the source of truth is D1, KV is a read-through edge cache, and a 5-minute stale window is acceptable. The expensive D1 query happens once per PoP per 5 minutes; everyone else gets a sub-10ms hit.

When you UPDATE the product, also call env.CACHE.delete('product:' + slug). The delete is eventually consistent too — within 60 seconds globally — but it tightens the staleness window from 5 minutes to ~30 seconds, which is usually fine.

`list` and pagination

const listed = await env.CACHE.list({ prefix: 'session:', limit: 1000 });
for (const k of listed.keys) {
  console.log(k.name, k.metadata, k.expiration);
}
if (!listed.list_complete) {
  const next = await env.CACHE.list({
    prefix: 'session:',
    limit: 1000,
    cursor: listed.cursor,
  });
}

Two things to remember about list:

list returns keys (and metadata), not values. You still need a get per key if you want bodies. That's by design — large bulk reads should go through R2 or a database.
list is eventually consistent with very recent writes. A key you wrote 200ms ago may not show up in the next list.

The consistency model, in plain English

You write K=v1 at the origin. The fan-out to every PoP starts. Until that fan-out completes (anywhere from a few seconds to 60 seconds, depending on PoP and load), readers may see:

The same PoP that wrote: v1 immediately.
A different PoP with no cached value: v1.
A different PoP with a cached v0: still v0 until either the cache TTL expires or the fan-out arrives.

There is no "synchronous" mode. You can't pay for stronger consistency. KV is the wrong tool if "now means now" anywhere in your design.

Free vs paid

The KV free tier is 100,000 reads/day, 1,000 writes/day, 1,000 deletes/day, 1,000 list/day, 1 GB storage — generous for low-write read-cache use cases, painful if you mistake it for a real database. Paid pricing on the Workers Paid plan is $0.50 per million reads, $5 per million writes/deletes/lists, $0.50 per GB-month.

The asymmetry is the whole story: reads are 10× cheaper than writes. Design around that, or use D1.

A clean architecture: KV in front, D1 behind

The most reliable production pattern is to use KV as a read-through cache layer on top of D1, never as a primary store. The shape:

Loading diagram…

The contract:

Authoritative state lives in D1. Counters increment via UPDATE. Joins happen in SQL. Migrations apply to D1.
KV holds read-shaped projections with TTLs of 60s to several hours.
Every D1 write either deletes the corresponding KV key or writes the new value — the choice depends on read frequency. For hot keys, write-through (put the new value) avoids the next read's cache miss. For cold keys, delete is enough.

The same diagram with the wrong arrow — counters living in KV, written from many regions, read from D1 — is the bug-fest version most people accidentally ship.

What if I want sub-minute consistency at the edge?

You have three real choices and none of them is KV.

Durable Objects — single-instance, strongly consistent, real transactions, websocket-capable. The right pick for state where every reader must see the latest write now. Chapter 5 of this series.
D1 + Sessions API — read replicas with bounded staleness, anchored to your last write. The right pick when the data is already in SQL and you just want a faster nearby read.
The Cache API + custom invalidation — for HTTP-response-shaped caching, the per-PoP Cache API is faster than KV and gives you precise control. Different use case from KV — keyed by Request, not by string.

The pros and cons cheat sheet

Pros

Sub-10ms reads at the edge. Once a key is hot in a PoP, reads are essentially free latency.
Tiny API surface. Five methods, all on env.CACHE (or whatever you named the binding).
TTL-driven GC. No cleanup job to write.
Metadata bag. Filter/list keys without round-tripping the value.
Free tier for real prototyping. 100k reads/day will carry a hobby project a long way.

Cons

Eventually consistent (~60 seconds). Not negotiable.
No atomic operations. Counters and increments must live elsewhere.
Writes are expensive vs D1. $5/M writes vs $1/M in D1.
Minimum TTL is 60 seconds. Sub-minute caching is a different tool.
1 write/sec/key throttle. Hot single-key writes will get throttled.

When to reach for KV

Use KV when all of the following are true:

The read rate is at least 10× the write rate, ideally 100×.
The value can be stale for ~60 seconds without harm.
The value is small (< 100 KB) and self-contained.
You don't need to mutate from multiple regions concurrently.

If any of those is false, KV is the wrong primitive. The next chapter walks through Durable Objects — the tool you reach for when KV's consistency model isn't enough, when you need real transactions across calls, or when you want a single tiny stateful actor coordinating a chat room, a rate limit, or a Stripe webhook fan-out.

📚 Go deeper with LIPAI WANG’s hands-on Udemy bootcampsBrowse all courses →

← Ch 3: D1 — SQLite at the Edge Ch 5: Durable Objects — Strong Consistency at the Edge→

Course PlatformBuild a Course Platform on CloudflareBuild a paid video course platform with Cloudflare Workers, R2, D1, auth, Stripe, and paywalls.Production WebProduction Web Apps SeriesProduction patterns for web apps: caching, rate limiting, webhooks, queues, cron jobs, and idempotency.WebUltimate Web Development SeriesWeb development tutorials for HTML, CSS, JavaScript, Next.js, Workers, databases, and production shipping.

Ship your apps faster

When you're ready to publish your Swift app to the App Store, Simple App Shipper handles metadata, screenshots, TestFlight, and submissions — all in one place.

Try Simple App Shipper

KV: The Edge Key-Value Store