Six chapters got a change from a keystroke on a MacBook Pro all the way to five production targets. This one is about the act everyone forgets until 2 a.m.: Operate — everything that happens after the deploy button. It's the fifth act of the Ch 1 play, and it's the difference between a pipeline that's fast-and-terrifying and one that's fast-and-calm.

The reframe that makes this chapter click: you will ship bad releases. Everyone does. The goal was never zero failures — it's that any single failure is small, visible, and quickly undone. By the end you'll have the three things that guarantee that: environments and secrets that contain the blast radius, observability that catches a bad release before your users complain, and a recovery playbook that works on every one of the five targets. Then we'll zoom all the way back out and look at the whole machine you've built.

Operate: the Loop After the Loop

Deploying isn't the finish line; it's the start of a second, quieter loop that runs until the next deploy.

Loading diagram…

Figure 1 — The operate loop. A release isn't "done" at 100% — it's done when you've watched it stay healthy. Observe → detect → decide → recover is the cycle that makes the previous six chapters safe to run at speed.

Environments and Promotion, Done Right

Every chapter leaned on "promote through environments." Here's the rule that makes them worth having: each environment is a more-trusted copy of the last, with its own isolated secrets, and you promote the same artifact forward — never rebuild per environment.

Environment	Audience	Web (Cloudflare)	Apps
dev	You	`cf:dev` local	Simulator / debug build
preview / staging	Reviewers	Per-PR version URL	TestFlight / Play internal track
production	Real users	`simpleappshipper.com`	App Store / Play / R2

The two non-negotiables: environment parity (staging should resemble production closely enough that "passed in staging" means something — same bindings, same shape of data) and secret isolation (a preview build cannot read a production key, because that key isn't in its environment). Both are just the Ch 2 least-privilege idea applied to where code runs.

Secrets: Where Everything Lives, and Why It's Scoped

A cross-platform pipeline accumulates a lot of secrets. The discipline is knowing where each one lives and giving each consumer only the ones it needs — so a single leak is contained, not catastrophic.

Loading diagram…

Figure 2 — Secrets live where they're used, scoped to the smallest blast radius. Web runtime secrets in Cloudflare; deploy tokens in GitHub environments; signing identity on the hardened mini (Ch 4); store certs guarded closest to their use. No machine holds a secret it doesn't need.

Four rules keep this safe:

Never in git. Secrets go in the platform's encrypted store (wrangler secret put, GitHub Actions secrets, the mini's keychain) — never in wrangler.toml, never in a committed file. This repo's check:krea-credentials guard exists to fail the build if a key is ever hardcoded.
Scope with GitHub Environments. Put deploy tokens behind a protected GitHub Environment with required reviewers, so a production deploy needs a human's approval even from CI — the Ch 2 human gate, enforced at the secret.
Prefer short-lived over long-lived. Where possible, use OIDC so a workflow mints a short-lived cloud credential per run instead of storing a long-lived token. A token that expires in minutes is a token that can't be exfiltrated and reused.
Rotate, and know your blast radius. For each secret, know what it can do and have a rotation path. "If this leaked, what's the worst case, and how fast can I revoke it?" should have an answer for every one.

Observability: See It Before They Do

A ramp you're not watching is a slow leap; a deploy you can't observe is a prayer. You don't need an enterprise stack — you need to answer one question fast: "is this release healthy?" Three cheap signals do it.

Signal	Web (Cloudflare)	Apps	Answers
Errors	`wrangler tail`, Workers analytics	Crashlytics / Sentry, App Store + Play crash reports	Is the new version throwing / crashing?
Latency / health	p50/p99 in dashboard analytics	ANR / hang rate, launch time	Is it slow or hanging, even if not crashing?
Business signal	Sign-ups, checkout success	Crash-free users %, retention	Is it quietly breaking the thing that matters?

Two practices turn signals into a safety net:

Alert on the leading indicator, not the lagging one. "Error rate doubled in the last 5 minutes" reaches you while you can still halt the rollout. "Revenue dropped this week" reaches you after the damage. Watch error rate and crash-free percentage during the ramp.
Compare per-version. The question is never "is the error rate high?" — it's "is it higher on the new version than the old?" Cloudflare's per-version metrics and the stores' per-release crash data are exactly this. A spike that correlates with the new version is your halt signal.

The Universal Recovery Playbook

When the signal goes bad, recovery depends on which half broke — and Ch 5 and Ch 6 gave you the moves. Here's the decision in one place:

Loading diagram…

Figure 3 — One decision, platform-specific recovery. Web reverts totally in seconds; apps halt-and-fix-forward, with the server-side kill switch as the one move that gives web-speed recovery even on installed native apps. The feature flag is the universal undo.

The principle under all of it, worth tattooing on the pipeline: optimise mean-time-to-recovery, not mean-time-between-failures. You can't prevent every bad release, but you can make every bad release a thirty-second non-event. A team that recovers in seconds ships boldly; a team that fears every deploy ships slowly and still breaks things.

What This Project Actually Does

Final honest status of the series. This repo has the recoverable foundation and the guards, and the formal observability is the main gap.

Live: secrets are kept out of git and enforced by a guard (check:krea-credentials); the web half can be reverted today with wrangler rollback because Cloudflare retains versions; the Mac signing identity lives on the hardened mini, not in CI or git.
The gap to the target: wiring per-version alerting (error-rate spike during a ramp), a protected GitHub Environment with a required reviewer on production deploys, and a documented server-side kill-switch pattern for the app's riskier features.
The honest summary: the irreversible things are guarded and the web half is already instantly recoverable — what's left is making the watching automatic instead of manual, which is additive, not a redesign.

The Whole Machine, One Last Look

Step all the way back. Across seven chapters you built one coherent system:

Act (Ch 1)	What you built	Chapter
Author	Human + agent on a low-trust MacBook Pro, gated by `ci:local`	1
Propose / Verify	The agent-assisted PR + four-layer review funnel	2
Build	One fail-fast CI DAG feeding five targets	3
(Infra)	A hardened Mac mini + MacBook Pro build cluster	4
Ship — web	Preview deploys, gradual rollout, instant rollback	5
Ship — apps	iOS/Android/Mac/Windows release pipelines	6
Operate	Environments, secrets, observability, recovery	7

That's the whole thing: a keystroke becomes a reviewed, checked, signed, gradually-released, observed, instantly-recoverable change across five platforms — with agents doing the reversible work at full speed and humans owning every irreversible gate. It runs on hardware an indie can afford: a few laptops and one trusted Mac mini. None of it requires a big team or a big budget; it requires the shape being right.

Mental Model — Three Sentences

Operate is the fifth act, not an afterthought: a release is done when you've watched it stay healthy, and the whole point of environments, scoped secrets, and observability is to make any single bad release small, visible, and quickly undone.
Scope every secret to the smallest blast radius — web secrets in Cloudflare, deploy tokens in protected GitHub Environments, the signing identity on the hardened mini, short-lived OIDC tokens over long-lived ones — and guard the irreversible ones (signing certs) hardest.
Optimise mean-time-to-recovery over mean-time-between-failures: web rolls back in seconds, apps halt-and-fix-forward, and a server-side kill switch is the universal undo — let agents triage the incident, but keep the recovery decision with a human.

Try It Yourself (15 Minutes)

Write your "is it healthy?" check. For your last deploy, what one signal would have told you fastest if it was bad? If you don't have it, that's the observability to add first.
Inventory your secrets. List every secret in your pipeline and where it lives. Any that are in git, on a laptop, or broader-scoped than they need to be is a fence to build — start with the signing identity.
Time your recovery. For each target you ship, how fast can you undo a bad release? Web should be seconds; apps should at least have a halt + a kill switch. The slowest one is your weakest link.
Add one kill switch. Take your riskiest feature and make it server-toggleable. You just bought yourself web-speed recovery on a native app.

The End of the Series — and Where It Goes Next

That's the full reference architecture, from Ch 1's map to a production system you can actually run. You can take any change from a keystroke to five live targets safely, with agents accelerating the reversible work and humans guarding every gate that's hard to undo — on one Mac mini and a few laptops.

The natural next step isn't another chapter to read — it's to make this the default for your projects. Pick one repo, turn on branch protection, wire npm run check as the required gate, give every PR a preview URL, register the mini as a hardened runner, and put a kill switch behind your riskiest feature. The architecture isn't the hard part; adopting it as the standard is. Do that once and every future project inherits a fast, calm, recoverable pipeline instead of reinventing one — which is exactly the point.

📚 Go deeper with LIPAI WANG’s hands-on Udemy bootcampsBrowse all courses →

← Ch 6: Shipping the App Half — iOS, Android, Mac & WindowsComing Soon →

Git + GitHubGit & GitHub Pro SeriesGit and GitHub practices for branches, pull requests, rebase, history repair, and team review.Ship iOSShip iOS Apps SeriesShipping workflows for iOS apps: signing, TestFlight, App Store Connect, CI, and release hygiene.Production WebProduction Web Apps SeriesProduction patterns for web apps: caching, rate limiting, webhooks, queues, cron jobs, and idempotency.

Ship your apps faster

When you're ready to publish your Swift app to the App Store, Simple App Shipper handles metadata, screenshots, TestFlight, and submissions — all in one place.

Try Simple App Shipper

After Deploy: Environments, Secrets, Observability, and the Rollback Safety Net

Operate: the Loop After the Loop

Environments and Promotion, Done Right

Secrets: Where Everything Lives, and Why It's Scoped

Observability: See It Before They Do

The Universal Recovery Playbook

What This Project Actually Does

The Whole Machine, One Last Look

Mental Model — Three Sentences

Try It Yourself (15 Minutes)

The End of the Series — and Where It Goes Next

Ship your apps faster