How do you design a sorted text file that is resistant to conflicts?

In an early version of drift—our CLI tool for anchoring specs to code to prevent documentation drift—we introduced a drift.lock lockfile to keep all doc-code bindings in one place. It worked great and cleaned up all of our documentation that previously had its frontmatter polluted with weird properties and hashes, except that I noticed we’d run into merge conflicts a lot.

Most of them turned out to be the spurious kind: same final state on both branches, just different bytes. See, in an effort to keep things simple the initial lockfile format was reminiscent of go.sum: newline-delimited, <spec> -> <target> <key>:<value> per line. As we quickly learned, for various reasons this was not the best decision.

docs/auth.md -> src/auth/login.ts origin:github sig:a1b2c3d4e5f6a7b8
docs/auth.md -> src/auth/provider.ts sig:1a2b3c4d5e6f7890
docs/billing.md -> src/billing/invoice.ts lang:ts origin:local sig:deadbeefcafebabe

An easy win

We are increasingly adopting property-based testing throughout our workflows at Fiberplane as a way to verify agent code. PBT formalizes invariants the agent might not maintain consciously, and exercises the input space at a scale humans can’t review one example at a time. It’s a way of pinning down behavior that’s hard to argue about by reading code.

The “we seem to be running into merge conflicts” vibe is a hard one to pin down, so it seemed like a perfect fit for a randomized state space exploration. I directed Claude to set up a minish harness (Zig’s property testing library) and quickly came upon a bug: drift’s lockfile serializer was inserting metadata fields in the order they happened to be appended in memory—effectively, randomly—which meant the same semantic state could end up reflected non-deterministically in the text file.

#Branch A:
docs/auth.md -> src/auth/login.ts sig:abc origin:github

#Branch B:
docs/auth.md -> src/auth/login.ts origin:github sig:abc

# Same content, different byte order, git flags a conflict.

The fix is six lines: sort metadata by key on write. This matters in practice for any binding with more than one metadata field—anchors that carry both a sig and an origin, for instance—because two branches re-linking such a binding could otherwise write the fields out in different orders depending on parse history.

The harder question

That’s not the interesting part though.

The little experiment prompted me to think: could we use this proptest harness not as a way to test for invariants but to measure, in N random cases, how many end up in conflict? Could we use the generator to manufacture conditions where two disjoint edits, fed to git merge-file, result in conflict markers – and use that rate as a randomized variant score? Well yes, we could.

The setup is this. A minish generator produces a small base lockfile and two disjoint edit scripts—operations like adding a binding, removing one, or changing a metadata field—where the two scripts are guaranteed to touch different bindings (the generator partitions them up front, so disjointness is by construction, not by post-hoc filter). For each generated state, we serialize the base, apply the left script to one copy, apply the right script to the other, and feed the three texts to git merge-file -p --no-diff3. Then we ask two questions. Did git emit conflict markers? If it didn’t, does the merged output parse to the same semantic state as applying both scripts to the base? The latter—a clean merge that’s silently wrong—is a hard fail. The former is a “spurious” conflict: same final state on both branches, but git fought anyway. Those get counted, not failed.

The first run of 400 trials revealed around 40% of them ended in conflicts. For example: branch A adds a binding for docs/auth.md -> src/auth/refresh.ts while branch B updates the sig value on an unrelated docs/billing.md -> src/billing/invoice.ts binding. Two operations on completely separate parts of the lockfile, no semantic disagreement, but git flagged a conflict anyway.

Why that happens, structurally: when git computes the diff between two branches, changes within roughly three lines of each other get grouped into a single hunk – and if both branches edited inside the same hunk, even at completely disjoint lines, the three-way merge flags it as a conflict. Sorted, one-line-per-binding files put unrelated edits right next to each other in the byte order, so they end up sharing hunks all the time. Git can’t tell that’s not a real fight.

One quietly important result alongside the 40%: zero hard-fails across all 400 trials. Every clean merge the harness produced was also semantically correct. The harness ended up doing two jobs – measuring how often we get fake conflicts, and confirming we don’t get silent ones. Drift never corrupts your lockfile on a clean merge. It just generates fake fights.

Why not just write a merge driver?

Two reasons. First, .gitattributes merge drivers get ignored in GitHub’s PR merge flow – custom drivers aren’t part of GitHub’s server-side merge path. Second, and more importantly, drift is intended to live in other people’s repos, and the fewer opinions we express about how other people should merge their code, the better. Format-level conflict reduction is the only lever we can actually pull.

What the data showed

Once we’d established the baseline, I had Claude write ten alternatives – variations on multi-line blocks, sectioned layouts, TOML, YAML, and INI, plus a couple of weird ones as controls. Each would go through the same “grinder”: same generator, same seeds, swap only the serializer.

All multi-line variants ended up clustering at around 25–31% conflict rate (click the row to see how the format looks like).

FormatBytesConflict %
baseline (line-based)211 B44%
docs/auth.md -> src/auth/login.ts origin:github sig:a1b2c3d4e5f6a7b8
docs/auth.md -> src/auth/provider.ts sig:1a2b3c4d5e6f7890
docs/billing.md -> src/billing/invoice.ts lang:ts origin:local sig:deadbeefcafebabe
multi-line blocks235 B28%
docs/auth.md -> src/auth/login.ts
  origin: github
  sig: a1b2c3d4e5f6a7b8

docs/auth.md -> src/auth/provider.ts
  sig: 1a2b3c4d5e6f7890

docs/billing.md -> src/billing/invoice.ts
  lang: ts
  origin: local
  sig: deadbeefcafebabe
sectioned single-line246 B31%
# docs/auth.md
docs/auth.md -> src/auth/login.ts origin:github sig:a1b2c3d4e5f6a7b8
docs/auth.md -> src/auth/provider.ts sig:1a2b3c4d5e6f7890

# docs/billing.md
docs/billing.md -> src/billing/invoice.ts lang:ts origin:local sig:deadbeefcafebabe
sectioned multi-line268 B25%
# docs/auth.md
docs/auth.md -> src/auth/login.ts
  origin: github
  sig: a1b2c3d4e5f6a7b8

docs/auth.md -> src/auth/provider.ts
  sig: 1a2b3c4d5e6f7890

# docs/billing.md
docs/billing.md -> src/billing/invoice.ts
  lang: ts
  origin: local
  sig: deadbeefcafebabe
TOML flat shipped324 B25%
[[bindings]]
doc = "docs/auth.md"
target = "src/auth/login.ts"
origin = "github"
sig = "a1b2c3d4e5f6a7b8"

[[bindings]]
doc = "docs/auth.md"
target = "src/auth/provider.ts"
sig = "1a2b3c4d5e6f7890"

[[bindings]]
doc = "docs/billing.md"
target = "src/billing/invoice.ts"
lang = "ts"
origin = "local"
sig = "deadbeefcafebabe"
YAML nested252 B26%
"docs/auth.md":
  "src/auth/login.ts":
    origin: "github"
    sig: "a1b2c3d4e5f6a7b8"
  "src/auth/provider.ts":
    sig: "1a2b3c4d5e6f7890"
"docs/billing.md":
  "src/billing/invoice.ts":
    lang: "ts"
    origin: "local"
    sig: "deadbeefcafebabe"
HR-separator control237 B28%
docs/auth.md -> src/auth/login.ts
  origin: github
  sig: a1b2c3d4e5f6a7b8
---
docs/auth.md -> src/auth/provider.ts
  sig: 1a2b3c4d5e6f7890
---
docs/billing.md -> src/billing/invoice.ts
  lang: ts
  origin: local
  sig: deadbeefcafebabe
aligned columns control224 B54%
docs/auth.md    -> src/auth/login.ts      origin:github sig:a1b2c3d4e5f6a7b8
docs/auth.md    -> src/auth/provider.ts   sig:1a2b3c4d5e6f7890
docs/billing.md -> src/billing/invoice.ts lang:ts origin:local sig:deadbeefcafebabe
INI blocks231 B28%
[docs/auth.md -> src/auth/login.ts]
origin = github
sig = a1b2c3d4e5f6a7b8

[docs/auth.md -> src/auth/provider.ts]
sig = 1a2b3c4d5e6f7890

[docs/billing.md -> src/billing/invoice.ts]
lang = ts
origin = local
sig = deadbeefcafebabe
TOML nested246 B28%
["docs/auth.md"."src/auth/login.ts"]
origin = "github"
sig = "a1b2c3d4e5f6a7b8"

["docs/auth.md"."src/auth/provider.ts"]
sig = "1a2b3c4d5e6f7890"

["docs/billing.md"."src/billing/invoice.ts"]
lang = "ts"
origin = "local"
sig = "deadbeefcafebabe"
TOML grouped279 B25%
[["docs/auth.md"]]
target = "src/auth/login.ts"
origin = "github"
sig = "a1b2c3d4e5f6a7b8"

[["docs/auth.md"]]
target = "src/auth/provider.ts"
sig = "1a2b3c4d5e6f7890"

[["docs/billing.md"]]
target = "src/billing/invoice.ts"
lang = "ts"
origin = "local"
sig = "deadbeefcafebabe"

That cuts the rate by roughly 40% relative—from ~44% down to ~25%—and seems to be the structural floor. None of the eleven variants we tried dipped below 25%. There’s an inherent feature of sorted text files: concurrent operations, especially add-add, have a certain rate of conflict that’s just hard to avoid. Both branches inserting a new binding at the same sort gap will always conflict, regardless of how distinctive the surrounding format is, because git’s hunk algorithm is purely positional.

Between the multi-line variants the differences are small but real. The TOML arrangements tie for the best measured rate, and parse with any off-the-shelf TOML library. As a side finding (the kind you only notice when you’ve built the serde benchmark anyway) TOML serializes about twice as fast as the previous format. The line-based implementation renders each binding to a scratch buffer then lexically sorts the buffers; TOML sorts bindings once by key and streams. Free perf win, completely unrelated to merge behavior.

TOML it is then

The data nominally favored a doc-grouped arrangement (~25.0% vs ~25.2% for flat), within measurement noise. We picked flat anyway – a plain list deserializes straight into a list of bindings, new fields slot in without nested-map gymnastics, and the version = 1 header lives at the top of the file. Here’s what shipped:

version = 1

[[bindings]]
doc = "docs/auth.md"
target = "src/auth/login.ts"
origin = "github"
sig = "a1b2c3d4e5f6a7b8"

[[bindings]]
doc = "docs/auth.md"
target = "src/auth/provider.ts"
sig = "1a2b3c4d5e6f7890"

[[bindings]]
doc = "docs/billing.md"
target = "src/billing/invoice.ts"
lang = "ts"
origin = "local"
sig = "deadbeefcafebabe"

Conflict rate goes from ~44% to ~25%, the parser comes off the shelf, and we get the serialize speedup for free.

The 25% floor is still there—adjacent-insert conflicts are structural, you can’t get below the floor without a semantic merger—but a ~40% reduction with no caveats is worth shipping.

What this is about

The interesting thing here isn’t the specific finding about lockfile formats. It’s that this investigation happened at all.

A randomized oracle, eleven alternative file formats, a serde benchmark – to settle what’s basically a quality-of-life annoyance on a niche CLI tool. Under previous-era time budgeting, completely indefensible. Now, doable in a couple of evenings.

What changed is that the agent ate the scaffolding cost. Parsers, generators, harness, alternative formats, benchmarks – none of those required me to write the boilerplate. I directed the questions; Claude wrote the harness; the data informed the decision.

Every codebase has a backlog of these problems – annoying but not blocking, never investigated because they didn’t justify the engineering hours. You’ll find more of them than you think once you start looking.

The harness, the eleven serializers, and the merge-rate scoring all live in drift PR #31 if you want to see the code.