The Problem · Part 15

Nobody made it slow. It's slow.

2026-06-15· 7 min read· by Think North

There's a conversation that happens at every software company that's more than a couple of years old. It goes like this:

"Is it just me, or has the app gotten... slow?"
"When?"
"I don't know. It used to feel instant. Now it doesn't."
"What changed?"
"...Nothing? Everything?"

Then everyone shrugs, someone says "we should look into that," and the meeting moves on — because the question "what changed?" has no answer, and questions with no answer don't get calendar time.

Here's the uncomfortable truth behind the shrug: nothing made it slow. No single commit. No single decision. If you could replay your entire git history and inspect every diff, you would not find the guilty one — because there isn't one. There are two thousand commits that each added between zero and four milliseconds, none of which was worth blocking a release over, and whose sum is the reason your product now feels like it's moving through soup.

Fast systems don't break. They boil.

You know the boiled frog parable — drop a frog in hot water and it jumps out; warm the water slowly and it never notices. (Biologists insist real frogs do notice. Your latency does not. The parable was apparently waiting for software to be invented.)

Slowness is a boiling problem, and this matters because your entire engineering toolchain is built for jumping problems. Think about what your tools are good at: a deploy that breaks something — alerts fire, dashboards go red, git bisect finds the culprit by dinner. The whole apparatus assumes a discontinuity: a before, an after, and a commit-shaped suspect between them.

Now hand that apparatus a continuous, two-year, 2%-per-month drift. Alerts? Thresholds were set relative to what was normal LAST year, and "normal" has been quietly renegotiated upward the whole time — the water and the thermometer are warming together. Bisect? There is no bad commit to find; bisect needs a cliff and you have a slope. Incident review? There's no incident. Nothing is wrong, in the same way nothing is wrong with each individual cigarette.

John Ousterhout's A Philosophy of Software Design describes exactly this dynamic for complexity: it's not one big mistake, it's accumulation — dozens of small increments, each reasonable, none worth fighting, collectively fatal. Latency is complexity's shadow. Every abstraction layer, every extra query, every "let's just add a quick check here" casts a few milliseconds of it. Meir Lehman got there first, in his laws of software evolution: as a used system changes, its quality declines unless it is actively, deliberately maintained. Not "may decline." Declines. Drift is the default trajectory; flat is the achievement.

A ledger of two thousand innocent commits

To feel how blameless the process is, look at what the individual entries in the ledger actually look like. Every one of these is a GOOD commit — reviewed, approved, correct:

"Add audit logging to order creation" — +3ms. Compliance asked. Reasonable.
"Validate address before checkout" — +8ms, one extra API call. Fewer failed deliveries. Reasonable.
"Upgrade ORM to v4" — +2ms per query from a safer default. Security patch included. Extremely reasonable.
"Add feature flag check" — +1ms. Times forty flags over two years. Each one reasonable.
"Fetch user preferences for new banner" — +6ms on every page, for a banner that was removed in 2025. The fetch stayed. Nobody remembers it's there.

No villain. No bad day. Two thousand entries like these, and your checkout flow is 2.4x slower than the version your earliest customers loved — and the slowdown has no author to summon to a retro. You can't even be angry, which is somehow the most annoying part.

And notice what fights AGAINST the drift: nothing, by default. A latency regression that trips no alert gets merged; a latency improvement takes deliberate effort someone must justify on a roadmap. The asymmetry is total. Milliseconds walk in the door for free, wearing feature costumes, and can only leave if someone buys them a ticket.

The average is lying to you, specifically

It gets worse, because on the rare occasion someone DOES check the latency dashboard, they usually look at the wrong number. The average. And the average response time is arguably the most misleading number in operations, for a reason Gil Tene has spent years hammering in his "How NOT to Measure Latency" talks: latency isn't experienced as an average, it's experienced as a distribution — and the pain lives in the tail. A system can hold a lovely 90ms average while its 99th percentile quietly triples, and the average will smile through all of it.

Quick gut-check on why tails dominate experience: if one page load touches dozens of backend calls (it does), the odds that a user's request avoids ALL your 99th-percentile events collapse fast — tail latency isn't the rare experience, it's the common one, a point Tene makes with actual arithmetic. Your users live in your p99. Your dashboard headline is the mean. You are, in a very literal sense, monitoring a customer who doesn't exist.

So the full picture of latency decay: the drift is too slow for alerts, too continuous for bisect, invisible to incident processes, and disguised by the summary statistic everyone defaults to. It's not that nobody's watching. It's that everything watching is structurally blind to exactly this shape of problem.

What "witnessed" latency would look like

Flip it around. Imagine each function in your codebase had the equivalent of a pediatrician's growth chart — its own latency history, percentiles included, tracked against ITS OWN baseline over months. Decay of this kind becomes trivially visible: price_quote was p95 = 80ms last June, it's 210ms now, and here is the slope. No commit to blame — but a trend to arrest, a number to put in the tech-debt conversation, and (crucially) a way to notice at month three instead of year two. This is boring, unglamorous instrumentation, and it beats every genius-level retroactive investigation, because the genius arrives after the frog is soup. (Per-function baselines with drift detection are exactly the "Growing fat" state in CodeNSM's telemetry — the function that's still working, still correct, and 2.6x heavier than its own last year.)

And to head off the obvious objection: yes, the 2024 DORA report and its predecessors show elite teams shipping faster AND more stably, which proves decay isn't destiny. But notice what those elite teams have in common — they measure relentlessly. Nobody outruns Lehman's law on vibes.

Three questions to ask this week

What was your checkout flow's p95 two years ago? Not today — two years ago. If the answer is "we don't retain that," you have no way to know whether you've decayed, which — given that decay is the default — means you probably have.
Which number does your team quote when asked "how fast is the app"? If it's an average, gently introduce them to the tail, where your actual customers live.
Whose job is it to notice a 40% slowdown that took eighteen months? Say the answer out loud. If it's nobody — and it's nobody — then the current speed of your product is not a decision anyone made. It's just what happens when two thousand reasonable commits go unwitnessed.

Your app didn't get slow on any particular day. That's exactly why it got slow.

Nobody made it slow. It's slow.

Fast systems don't break. They boil.

A ledger of two thousand innocent commits

The average is lying to you, specifically

What "witnessed" latency would look like

Three questions to ask this week

References

See your own codebase as an office.

Read next

Nobody wrote this code

The prompt lottery

Graduating without scars