You've been typing git add, git commit, git push for a while now. Maybe you've watched "Auto-sync" commits land on your repo, seen pull requests merge with a "#5" next to them, and nodded along without really knowing what any of it means. That's normal — almost nobody is taught git; they absorb three commands and hope.
This series fixes that, and it starts here, with the one idea that makes everything else click:
Git is not a backup tool or a file-syncer. It's a graph of snapshots. Every command you run is just adding nodes to that graph or moving labels around on it.
Get this chapter and the scary stuff later — rebase, squash, reflog, "detached HEAD" — turns into obvious consequences of a simple model. Let's build it.
A Commit Is a Snapshot, Not a Diff
The single most common misconception: people think git stores changes (diffs) — "this commit added 3 lines, that one deleted 5." It doesn't. Git stores complete snapshots.
Every time you commit, git takes a photograph of your entire project — every file, exactly as it is — and saves it. A commit is that snapshot plus a little metadata:
| Part of a commit | What it holds |
|---|---|
| Snapshot | The full state of every tracked file at that moment |
| Parent | A pointer to the commit that came before it |
| Author + date | Who made it and when |
| Message | The text you wrote (-m "...") |
| SHA | A unique 40-character ID (you usually see the first 7) |
(Git is smart about storage — identical files aren't duplicated between snapshots — but conceptually every commit is a full snapshot. Diffs are computed on demand by comparing two snapshots, not stored.)
The SHA: Why Commits Can't Be Quietly Changed
That 8367b9b9 you see next to a commit is its SHA — a hash of the commit's entire contents (snapshot + parent + author + message). This has a profound consequence:
Change anything about a commit — one character of the message, one byte of a file, the parent it points to — and you get a completely different SHA. It's a different commit.
This is why git is trustworthy: a commit's ID is a fingerprint of its content. If two people have a commit with the same SHA, it is byte-for-byte the same commit. And it's why "rewriting history" (Ch 2) always produces new commits with new SHAs rather than editing existing ones — the old SHA could never match the new content.
History Is a Graph (the DAG)
Each commit points to its parent — the commit before it. Follow those pointers backward and you walk the entire history. That structure has a name: a DAG (Directed Acyclic Graph) — "directed" because pointers go one way (child → parent), "acyclic" because you can never loop back to where you started.
Most of the time it looks like a simple line:
Figure 1 — Commits point backward at their parent. C's parent is B; B's parent is A. To see history, git just follows the arrows. (Arrows point at the parent — that's the direction git actually stores.)
But a straight line is only the common case, not the definition. The moment you branch or merge, that line forks and rejoins — which is the whole reason git needs a graph, not a list. Each of the three letters is a real guarantee, so it's worth unpacking them:
| Letter | Means | Why it holds in git |
|---|---|---|
| Directed | Every edge has a direction | A commit points at its parent, never forward. A commit can't even know its own children — they didn't exist when it was created and hashed. |
| Acyclic | You can never loop back to the start | Following parents only ever moves toward the past, ending at the first commit (the root, which has no parent). A cycle is impossible: a commit's SHA includes its parent's SHA, so it physically cannot point at a descendant whose hash doesn't exist yet. |
| Graph | Nodes joined by edges — richer than a line | Commits are nodes; parent links are edges. It's more than a list because a commit can have several children, and a merge can have several parents. |
Why a Graph, and Not Just a Line
A straight line would mean every commit has exactly one parent and exactly one child. Git is richer in two specific ways, and those two are what bend the line into a graph:
- One commit, many children → the graph forks. Start two branches from the same commit and that commit now has two children. This is branching.
- One commit, many parents → the graph joins. A merge commit ties two lines back together by having two parents (occasionally more). This is merging.
Figure 1b — A real DAG. Commit B forks into two children (D and E); the merge commit M rejoins them by having two parents. A plain line can express neither move — which is exactly why git's history is a graph. (Ch 2 covers how you deliberately create and reshape this.)
A subtle point worth nailing: it's a graph, not a tree. A tree forbids two branches from rejoining — but git allows it, every time you merge. That's the difference between git's DAG and the simpler structures you may have met:
| Structure | Parents per node | Children per node | Can two paths rejoin? |
|---|---|---|---|
| Linked list | 1 | 1 | No |
| Tree | 1 | many | No |
| Git's DAG | 1 or more | many | Yes — at a merge |
Reachability: What git log Actually Walks
The DAG hands you one more idea for free, and it pays off all the way in Chapter 4: reachability. Starting from any label — a branch, a tag, or HEAD — the commits you can reach by walking parent pointers are that label's history. git log is literally that walk: start at HEAD, follow parents, print each node until you hit the root.
That single idea explains several everyday things at once:
git logon a branch shows only that branch's ancestors — the part of the graph reachable from its label.- Merging two branches makes both histories reachable from the merge commit — which is why a merged feature's commits suddenly appear in
main's log. - A commit no label can reach is "unreachable" (or "dangling"). It still physically exists in
.git, but nothing points to it — which is exactly what a badreset --hardor a botched rebase produces. The reflog (Ch 4) is how you find it again before git's garbage collector eventually sweeps it away.
So hold the full picture now: history is not a flat list but a graph of snapshots — mostly linear, but forking at branches and rejoining at merges. Reshaping that graph on purpose is the whole subject of Ch 2; next, though, we meet the labels that point into the graph — because that's what a branch really is.
HEAD and Branches Are Just Labels
Here's the part that unlocks everything. A branch is not a copy of your code. A branch is just a sticky note with a commit's name on it. That's it. main is a label that points at one specific commit — the newest one on that line.
And HEAD is a special label meaning "you are here right now." Usually HEAD points at a branch (which points at a commit), so HEAD is really saying "the branch you're currently on."
Figure 2 — main is a label on commit C. HEAD points at main, meaning "you're on the main branch." When you commit, git creates a new snapshot and slides the main label forward to it.
This is why branching in git is instant and free: creating a branch just writes a new sticky note pointing at the current commit. No files are copied. A branch is 40 bytes — the SHA it points to. Once you internalise "branches are movable labels," git checkout, git reset, and "fast-forward" all become obvious — they're just moving labels around.
The Three Trees: Where add and commit Move Things
Now, the part that confuses everyone about git add. Why two steps to save — why add then commit? Because git has three places your files live, and the two commands move files between them:
| "Tree" | What it is | Plain English |
|---|---|---|
| Working directory | The actual files on disk you edit | "My messy desk right now" |
| Staging area (the "index") | A holding pen for the next commit | "The stuff I've decided goes in the next snapshot" |
| Repository (.git) | The committed history (the DAG) | "The permanent photo album" |
Figure 3 — git add moves changes from your working directory into the staging area; git commit writes everything staged into a new commit in the repository. The staging area exists so you can commit some of your changes and not others.
The staging area feels like bureaucracy until the first time you've changed five files but only want two of them in this commit. git add file1.js file2.js stages just those; git commit snapshots only what's staged. The other three stay in your working directory for a later commit. That selectivity is the entire reason add and commit are separate.
Push, Fetch, Pull: Syncing Two Graphs
Everything above is local — it all happens in the .git folder on your machine, no internet required. GitHub enters only when you want to share that graph or back it up.
A remote is just another copy of the repository living somewhere else. origin is the conventional name for "the GitHub copy." Three commands move commits between your local graph and the remote graph:
| Command | What it does |
|---|---|
git push | Send your local commits up to the remote (GitHub). Moves the remote's branch label forward. |
git fetch | Download commits from the remote into your local repo — but don't touch your working files yet. |
git pull | fetch + merge in one step: download and integrate the remote's changes into your current branch. |
So when you see those Auto-sync 2026-05-28 commits appear on your GitHub repo, here's the full story in model terms: a background process ran git add (working dir → staging), git commit (staging → a new snapshot node, main label slides forward), and git push (that new node is copied up to origin, and GitHub's main label slides forward to match). Three label-and-snapshot operations. Nothing magic.
Figure 4 — Two copies of the same graph. push sends your new commits up; fetch/pull bring the remote's new commits down. They're in sync when both main labels point at the same SHA — which is exactly what git status means by "up to date with origin/main."
Mental Model — Three Sentences
- A commit is a full snapshot of your project plus a pointer to its parent, identified by a SHA that changes if anything in it changes — so history is an immutable graph of snapshots, not a list of diffs.
- Branches and HEAD are just movable labels pointing at commits; "switching branches" and "moving forward" are git sliding labels, not copying files.
addmoves changes into the staging area,commitsnapshots the staging area into the repository, andpush/pullsync your local graph with GitHub's copy.
Try It Yourself (10 Minutes)
Run these in any git repo and watch the model:
git log --oneline --graph --all— see the DAG as ASCII art. Each line is a commit; the left rail shows the parent links.git cat-file -p HEAD— print the raw commit HEAD points at. You'll literally seeparent <sha>,author, and the tree (snapshot) pointer. This is a commit with the lid off.git status— read it as "differences between the three trees": Changes not staged = working dir vs staging; Changes to be committed = staging vs last commit.- Edit two files.
git addonly one. Rungit statusand see one staged, one not. Commit. Notice only the staged change made it into the snapshot. git branch experimentthengit log --oneline --graph --allagain — see a second label appear on the same commit. You created a branch and copied zero files.
Where This Lands in the Series
You now have the model: snapshots, SHAs, a DAG, movable labels, three trees, two synced graphs. That's the foundation everything else stands on.
Next chapter uses it to demystify the question every team argues about: when you bring one branch into another, should you merge, rebase, or squash? Each is just a different way of rearranging the graph you now understand — including the squash-merge that produces those tidy "PR #N" commits on your repo.
Ship your apps faster
When you're ready to publish your Swift app to the App Store, Simple App Shipper handles metadata, screenshots, TestFlight, and submissions — all in one place.
Try Simple App Shipper