CodeNSM
The Problem · Part 27

What good looks like: five observable properties of a healthy codebase

2026-06-05· 7 min read· by Think North

Twenty-six parts into a series called "The Problem," you have every right to ask: fine, what does NOT-the-problem look like?

Fair. And the answer is going to be stranger than you expect, because healthy codebases don't look the way engineers imagine when they close their eyes. The fantasy is aesthetic — elegant abstractions, uniform style, zero warnings, a repo you could frame. Real health is different. Some of the healthiest systems in the CodeNSM fleet telemetry contain genuinely ugly code. What they have instead of beauty is something better: they are known. Every important question about them has a current answer. Health, it turns out, is not a property of the code. It's a property of the relationship between the code and the people responsible for it.

Here are the five observable properties. Not virtues, not aspirations — observables. Each one you can check this week.

1. The traffic concentration is known

In every production codebase we've measured, call load is savagely concentrated: a small minority of functions carries the overwhelming majority of traffic (the concentration is strong enough across the CodeNSM fleet that we pre-registered it as a formal research hypothesis). Unhealthy teams don't know their own concentration — they treat 4,000 functions as 4,000 equal citizens, review them with equal care (i.e., equally little), and are perpetually surprised by which change caused the outage. Healthy teams can name their hot paths the way a hospital knows its trauma bay, and their attention follows the traffic: heavyweight review on the load-bearing spine, a wave-through for the decorative fringe. Same review budget. Completely different risk posture.

2. The error hum is audited

Every production system tolerates some error rate — the flaky vendor, the retry that usually saves it, the timeout that mostly doesn't matter. That's fine. What distinguishes health is that every tolerated error is a decision rather than a discovery: someone knows the checkout path fails 0.2% of the time, knows why, decided it's acceptable, and wrote the number down. In the unhealthy version, the same 0.2% exists but nobody has ever seen it collected in one place, and the day it becomes 1.4% there is no baseline to notice against. Tornhill and Borg's Code Red data showed low-quality code producing dramatically more defects — but the operational killer in that study was unpredictability. An audited hum is predictable. An unaudited one is a slow surprise on a payment plan.

3. Dormancy is pruned

Lehman's laws of software evolution predicted it in 1980: left alone, systems only grow. Code that stopped earning stays on payroll — maintained, migrated, security-patched, and (per Part 24) lovingly ported into the next rewrite. In the fleet telemetry it is routine to find a quarter or more of a codebase's functions dormant at any given moment. The healthy team's distinguishing behavior is almost comically simple: they delete things. On a schedule. With a little ceremony, even — deletion is celebrated in the changelog like a feature, because it is one. Ask a team "what did you delete last quarter?" and you learn more about their debt trajectory than any static-analysis score will tell you. (The AI era raises the stakes here: when generation is nearly free, the dormant share compounds at generation speed too.)

4. The critical-rare functions are guarded

Here's the subtle one. Traffic-weighted attention (property 1) has a failure mode: it ignores the functions that run rarely but MUST work — password reset, refund issuance, data export, the disaster-recovery path. These are your fire extinguishers: called almost never, catastrophic when broken, and invisible to any purely traffic-based ranking. Healthy teams maintain an explicit register of them, and those functions get what the rest of the codebase doesn't: exhaustive tests, synthetic probes that exercise them weekly, and alarms tuned to their tiny baselines. Unhealthy teams discover their fire extinguishers are empty during the fire — usually in the same incident review where someone says "but that code hasn't changed in two years," as if that were a defense.

5. Debt is weighted by load

The healthy team keeps a debt register — most teams manage that much — but theirs has an extra column, and the extra column changes everything: how hard does production lean on this? Their register is ranked by fragility-times-traffic, per Part 22, which means it routinely ranks a moderately-messy checkout function above a monstrous-but-dormant admin tool, and their refactoring dollars land where Cunningham's interest actually accrues. They pay down the 30% loan before the 0% one. This should not be a distinguishing property of elite teams. It's just accounting. And yet.

Health isn't beautiful code. Health is a codebase with no unknown load-bearing walls — where every important question has a current answer and nothing critical is being held up by luck.

What it feels like from the inside

If the five properties sound abstract, here's how they cash out day to day, because health has a texture you can feel from inside the team. Estimates tighten — not because anyone got better at guessing, but because the variance was always coming from the unmapped fragile spots, and now they're mapped; Tornhill and Borg's change-time findings were really a study of estimation error wearing a lab coat. Incidents get boring: fewer, smaller, and diagnosed in minutes, because the baseline that makes an anomaly visible already exists. Review gets faster AND deeper at the same time, which sounds like a violation of physics until you remember property 1 — the deep attention goes where the traffic is, instead of being spread uniformly thin. The debt conversation stops being a seasonal fight about a rewrite (Part 24) and becomes a standing agenda item with a ranked list and a budget. The Sunday-night dread — the specific flavor that comes from knowing something load-bearing is unwatched — is the first thing to go, and the last thing anyone admits they had. Ask an engineer who has worked in both kinds of org: they can tell you within a week which one they've joined, without being shown a single metric. The texture gives it away in the standups.

None of this is utopia

The temptation is to file these five properties under "nice for Google." The DORA research program spent a decade dismantling that excuse: the practices that separate elite performers from the rest are not resource-bound, they're discipline-bound — and the payoff compounds, because visibility makes every subsequent engineering decision cheaper. Note also what the list doesn't require: no rewrite, no framework migration, no hiring spree, no code-beauty crusade. Every property is fundamentally observational — know your traffic, audit your errors, count your dormancy, register your critical-rares, price your debt. The gap between sick and healthy is mostly a gap in knowing. (Closing that gap per-function and keeping it closed is what CodeNSM is for — the five properties above map almost one-to-one onto its office view, states and debt tiers — but the properties are tool-agnostic; a disciplined team with a spreadsheet and a cron job can hold all five.)

Score yourself against the quick version below — six questions, one point per gap, no partial credit. Then Part 28, the big one, will strap you into the cockpit and do the full twelve-item flight-instrument check.

References

  1. Tornhill, A. & Borg, M. (2022). Code Red: The Business Impact of Code Quality.
  2. Lehman, M.M. (1980). Programs, Life Cycles, and Laws of Software Evolution. Proceedings of the IEEE.
  3. Google Cloud — DORA research program.
  4. Cunningham, W. (1992). The WyCash Portfolio Management System. OOPSLA '92.

See your own codebase as an office.

One pip install and every function reports for duty — archetype, live state, debt tier, and a single Code-Health North-Star. Free plan, no card.

Read next