The case for code telemetry: what thirty parts add up to
Thirty parts ago, this series opened with a sentence designed to be slightly nauseating: nobody wrote this code. If you've been here the whole way — and if you have, genuinely, thank you, that's a book's worth of dread — you've earned the assembled version of the argument. Here it is, whole.
Three controls walked into the AI era. None walked out.
Software quality was never actually protected by process documents. It was protected by three informal human controls so old and so ambient that nobody thought of them as controls at all — until they broke in the same three-year window.
- Authorship broke. For the entire history of the field, every line had a human author who understood it at least once, at the moment of writing — and who felt the little wince when they borrowed against the future (Part 21; Cunningham's original metaphor assumed nothing less). Now an enormous and growing share of production code is drafted by models: fluent, confident, wince-free, at volumes GitClear has been tracking alongside rising churn and duplication. The knowledge that used to be a byproduct of writing simply stopped being produced. Peter Naur told us in 1985 what a program without its theory is — and we've spent three years mass-producing exactly that.
- Review broke. The backstop for authorship was always the second pair of eyes — and the eyes were never as good as advertised (Bacchelli and Bird found review catching context and style far more reliably than deep defects, back when humans wrote the code at human speed). Ask that same instrument to validate machine-drafted code at machine-drafted volume and you get Part 4's review theater: approvals as social ritual, LGTM as liturgy. The 2024 DORA report supplied the system-level receipts — AI adoption associated with decreased delivery throughput and stability, even as everyone involved felt faster.
- Apprenticeship broke. The pipeline that produced people who could catch all this — juniors earning scar tissue on real incidents, slowly becoming the tech leads of Part 26 — quietly shut down, because the juniors are prompting now, and the scars don't form (Part 3: graduating without scars). You cannot rent the replacement forever. There won't be enough of them to rent.
Any one of these breaking would be an era. All three at once is a regime change. And here's the trap this series has circled thirty times: every instinctive response is nostalgia. Mandate deeper review? You've mandated more theater, at generation volume. Ban the assistants? Your competitors didn't, and your own developers are using them anyway, in a private window. Hire more seniors? See control three. You cannot process-document your way back to 1995. The controls that broke were all, at bottom, forms of knowing — and the only honest replacement is a different way of knowing.
Visibility is the replacement control
Look at what each broken control actually provided. Authorship provided knowledge of what the code was supposed to do. Review provided a check on what it appeared to do. Apprenticeship provided people who accumulated both. Every one of them was an epistemic layer — and every one of them keyed on the writing of code, which is exactly the stage the AI era industrialized beyond human participation.
But there's a second place knowledge can come from, and Parts 22 through 29 built the case brick by brick: behavior. Production doesn't care who wrote the function or how confidently. It records what runs and how hard (the concentration that re-prices your whole debt register, Part 22), where the repair-hours drain (Part 23's firefighting ratio), what a rewrite would incinerate (Part 24), what a CEO can be told without translation loss (Part 25), what the great tech lead's inner map contained all along (Part 26), and the five observable properties of health (Part 27). The instruments that read documents — code and commits — go dark exactly where these answers live (Part 29). The one that reads behavior doesn't. Authorship-blindness, the historic weakness of runtime measurement, is in an era of unknowable authorship its entire qualification.
The old controls knew the code by watching it being written. That stage is lost. The replacement control knows the code by watching it live — and that stage cannot be automated away, because it IS the product, running.
The code floor supervisor
So here is the job description the AI era created and almost nobody has filled. Every factory floor that ever ran machinery faster than a human could inspect it evolved the same role: a supervisor who walks the floor — not operating the machines, not replacing the engineers, just maintaining the living answer to "what's actually happening out there?" Your codebase is now a floor of ten thousand machines, retooled nightly by something that doesn't attend standup. The supervisor layer — human, instrument, or honestly both — has four duties, and this series has visited each one:
- See it. Every function reporting for duty: called or dormant, erring or clean, fast or drifting. Not the twelve endpoints someone hand-built dashboards for in 2024 — all of it, continuously, because the next incident is statistically not on a dashboard.
- Weigh it. Load is the multiplier on everything: debt (Part 22), review attention (Part 26), refactoring ROI (Part 23), rewrite scope (Part 24). An unweighted fact about a codebase is trivia. A traffic-weighted one is a priority.
- Judge it. Against fixed, reproducible rules — role-aware baselines, deterministic states — never against a model's mood (Part 29). The judgment layer is what turns telemetry from a firehose into a worry list short enough to act on.
- Coach it. Route each judged finding to whatever fixes things now: a human engineer, or a coding agent handed a precise, verifiable brief. In a pleasing irony, the same generation-speed machinery that broke the old controls is the thing that makes the findings cheap to act on — provided something deterministic is upstream, deciding what's true.
That layer is what CodeNSM is — this series has been the argument, and the product is the conclusion. But hold the claim to its honest scope: the floor supervisor doesn't restore the theory Naur mourned, doesn't make review deep again, doesn't mint scarred seniors. Nothing does; the old world is gone. What visibility does is make the new world governable — it converts unknown unknowns into a ranked list, and a ranked list is something even a broken-controls era can act on.
End of the problem. Start of the method.
Thirty parts of problem earn you the right to ask how — how a runtime census actually classifies ten thousand functions into jobs and states, and how you'd know the instrument itself isn't just confident nonsense. Those are the methodology series: start with the archetype taxonomy — meet your code employees, then read the 51 pre-registered hypotheses, where we put our own instrument on the falsifiable end of the microscope, in public, with p-values.
The problem arc closes where it opened. Nobody wrote this code. Nobody borrowed this debt. Nobody's inner ear works in these clouds. Fine — then stop navigating by nobody. The code is telling you everything about itself, forty thousand times a day, in the only language it has ever spoken: behavior. Thirty parts later, the case for code telemetry is one sentence long.
Start listening.
References
- Cunningham, W. (1992). The WyCash Portfolio Management System. OOPSLA '92 — origin of the technical-debt metaphor.
- Naur, P. (1985). Programming as Theory Building.
- Bacchelli, A. & Bird, C. (2013). Expectations, Outcomes, and Challenges of Modern Code Review. ICSE / Microsoft Research.
- Google Cloud (2024). DORA Accelerate State of DevOps Report.
- GitClear — Coding on Copilot: AI's downward pressure on code quality.
- Brooks, F.P. (1975/1995). The Mythical Man-Month: Essays on Software Engineering.