The tech-debt crisis nobody codes anymore
When Ward Cunningham coined the debt metaphor in his 1992 OOPSLA experience report, he was describing a deliberate act. Shipping "not-quite-right code," he wrote, was like taking on debt: useful to go faster, dangerous if unpaid, with every minute spent on the shaky code counting as interest. And there was a load-bearing assumption underneath it, so obvious nobody ever bothered to state it: the borrower knew they were borrowing. A developer wrote the shortcut, felt the wince, and carried the repayment plan in their head.
That assumption just quietly died, and you were probably in the room when it happened. In the LLM era, an enormous and growing share of production code is drafted by models — GitHub's Octoverse reporting has tracked the surge in AI-assisted development across the platform for several years running — and the human in the loop increasingly reviews code they did not write, would not have written that way, and may not fully hold in their head. The wince is gone. The debt remains.
The three new species in the debt zoo
1. Unfelt debt
Classic debt was at least experienced at creation time. LLM-drafted code arrives fluent, idiomatic-looking and confident, which suppresses exactly the friction that used to mark a shortcut as a shortcut. The empirical warnings here are real: Perry and colleagues' Stanford study found participants using an AI assistant wrote measurably less secure code while believing it was more secure. Read that again. The pattern — output degrades, confidence rises — is the worst possible combination for debt accounting, because the ledger's own bookkeeper has been sedated.
2. Prompt-dependent quality
The quality of generated code is a function of the prompt, the context window, and the model version — three inputs that are invisible in the diff. Two adjacent functions in the same file can now embody entirely different levels of care, depending on whether one was generated at 5 p.m. from a lazy prompt. Quality used to correlate with authorship, which reviewers could learn; it now varies line-to-line with no visible signal. GitClear's longitudinal analyses of hundreds of millions of changed lines have reported rising churn and copy-paste-style duplication coinciding with AI-assistant adoption — as they frame it, downward pressure on code quality precisely where velocity went up.
3. Review at generation speed
GitHub's own research on Copilot reported dramatic speedups on scoped tasks — their controlled experiment found developers completing a task 55% faster. Take the number at face value; the second-order effect is the problem. Review capacity did not get 55% faster. Microsoft Research's classic study of code review (Bacchelli & Bird) found reviewers were already better at spotting style issues than deep defects when they had full context; reviewing machine-drafted code at machine-drafted volume erodes the one advantage human review had. And the 2024 DORA report supplied the system-level echo: teams adopting AI assistance reported decreases in delivery throughput and stability even as individual satisfaction rose. Local acceleration, global strain.
Tech debt used to be a loan you remembered taking. Now it accrues like background radiation — nobody signed for it, and the dosimeter hasn't been invented yet. That dosimeter is the product category we are building.
Why every debt detector you own is pointed at the wrong thing
Every established debt-detection approach assumes signals the LLM era removes. Static analyzers key on smells — but models generate smell-free mediocrity fluently. Git archaeology infers risk from authorship and churn patterns — but "author" is now a human-model blend, and churn is inflated by regeneration. Reviewer intuition keys on "this doesn't look like Dana's code" — a signal that no longer exists. Even self-report fails: METR's 2025 randomized trial found experienced open-source developers were slower with AI assistance on familiar code while estimating they were faster. If the authors can't feel the interest accruing, asking them where the debt is has stopped working.
The one witness that was never fooled
One observer never watched the authoring process at all, and therefore never got played by it: the runtime. Whoever or whatever wrote a function, production still records how often it runs, how often it fails, how its latency drifts, and whether anything calls it at all. Runtime behaviour is authorship-blind — which, in an era when authorship has become unknowable, converts from a limitation into the whole point.
This is the bet underneath CodeNSM: treat every function as an employee whose performance review comes from observed behaviour, not from its résumé. A generated function that runs hot, fails at the boundary and drags the tail is Injury-prone regardless of how confident its docstring sounds; a generated function that quietly does its job is Fit, and nobody needs to litigate its origins. Debt tiers computed from behaviour restore the ledger that generation-time sedation destroyed — and our fleet's own telemetry on how AI-native codebases differ structurally is one of the pre-registered hypothesis categories in our research programme, so the claims will arrive with p-values attached.
What to do on Monday
If you lead a team shipping AI-assisted code — which is to say, if you lead a team — three moves follow directly from the diagnosis. First, stop treating review as the debt gate; it was a weak gate before generation and it is a broken one now. Second, instrument behaviour: every function's call volume, error economics and latency drift, tracked against its own baseline, because baselines don't care who typed the code. Third, make dormancy visible — generated code is cheap to create and therefore accumulating faster than ever, and every uncalled function is maintenance payroll with no output. None of this requires slowing the assistants down. It requires watching what they shipped with instruments built for the volume.
Cunningham's metaphor doesn't need retiring; it needs modern accounting. The borrowing is now automatic, anonymous and constant. The only honest ledger left is the one production writes.
References
- Cunningham, W. (1992). The WyCash Portfolio Management System. OOPSLA '92 — origin of the technical-debt metaphor.
- Perry, N. et al. (2023). Do Users Write More Insecure Code with AI Assistants? (Stanford; ACM CCS).
- GitClear — Coding on Copilot: AI's downward pressure on code quality.
- GitHub (2022). Research: Quantifying GitHub Copilot's impact on developer productivity and happiness.
- Bacchelli, A. & Bird, C. (2013). Expectations, Outcomes, and Challenges of Modern Code Review. ICSE / Microsoft Research.
- Google Cloud (2024). DORA Accelerate State of DevOps Report.
- METR (2025). Measuring the Impact of Early-2025 AI on Experienced Open-Source Developer Productivity.