Think North Learning
thinknorth.consulting
THE 95% PROBLEM Contradiction 7 min

Where AI ROI Hides

01 · THE SETUP

Two findings from the same year, 2025:

One: MIT's Project NANDA studied 300 enterprise AI deployments and concluded that about 95% of generative-AI pilots produced no measurable P&L impact. The stat rattled markets and launched a thousand LinkedIn posts.

Two: in the same organisations, employees privately reported the opposite — the same MIT work found a thriving “shadow AI economy,” with workers at the majority of companies using personal chatbot accounts for real work, and swearing by the hours saved.

Everyone's more productive. The company sees nothing. Both findings survive scrutiny. Reconcile them.

Sit in the contradiction. Real hours saved at desks; zero showing up in the P&L. Where, physically, between a person's saved hour and a company's income statement, could the value be leaking out?

02 · YOUR CALL ⏸ YOUR CALL — PICK ONE TO CONTINUE

What's the main reason the pilots showed nothing, per the research?

If you pick A

The comfortable explanation — blame the tech, wait for GPT-next. But the research pointed the other way: the same employees were getting real value from consumer versions of the same models, unofficially. The capability was in the building. The pilots still failed.

If you pick B — the mechanism

That's the finding. Failed pilots were standalone tools dropped next to real work rather than into it — nothing learned from feedback, no named owner, no baseline measured, so no delta could appear on the P&L. The 5% that succeeded picked one process, integrated deeply, and measured relentlessly. (Notably: bought tools succeeded ~twice as often as internal builds.)

If you pick C

Half right — official tools did sit unused. But 'resistance' misreads it: the same staff were enthusiastically using AI unofficially. They weren't rejecting AI; they were rejecting tools that didn't fit the work. That's an integration failure wearing an adoption costume.

If you pick D

Sensible bookkeeping instinct, but the arithmetic points elsewhere: inference costs collapsed (Stanford's AI Index measured ~280× cheaper for constant capability in two years), and failed pilots mostly died before spending much. What they lacked wasn't budget headroom. It was a place for the saved value to accumulate.

Pick one — committing first is what makes the answer stick.

the lesson continues after you choose

03 · NOT SO FAST

The headline reading — “AI doesn't deliver ROI” — is the one the market briefly panicked over. It's reasonable: 95% is a damning number.

But it can't survive the second finding: value was visibly being created at the desks of the same companies. A number that damning, next to behaviour that enthusiastic, isn't a verdict on the technology. It's a map of where value leaks between a saved hour and a financial statement — and the leak has three specific holes.

04 · THE MECHANISM
HOURS SAVED at individual desks P&L ~5% arrive LEAK 1 wrong tasks, no baseline measured LEAK 2 verification & rework (“workslop”) LEAK 3 saved time never reaches a bottleneck TO BANK THE VALUE right task (clear × recurring × costly) → embedded in the workflow → verification cheaper than the saving → freed capacity redeployed at a constraint → delta measured vs baseline net value = time saved − time verifying − rework when verification was skipped
The leak between saved hours and the P&L: unmeasured tasks, verification cost, and gains that never reach a bottleneck.

Hole one: the wrong tasks. Failed pilots clustered in flashy, client-facing, loosely-specified work — exactly the quadrants the CoReCo map says to avoid. The MIT data adds a twist most boards get backwards: over half of AI budgets chased sales and marketing, while the clearest measured ROI came from unglamorous back-office automation — the clear-steps, high-recurrence bubbles. ROI lives where work repeats and success is checkable, not where demos impress.

Hole two: verification cost. The honest equation is net value = time saved − time verifying − rework when verification was skipped. Fluent output makes the first term look huge and hides the second — until unchecked AI work lands on a colleague's desk. Researchers coined a word for that landing: “workslop” — polished-looking output that transfers the real effort downstream to whoever must decode or redo it. A task with cheap verification (code that runs, a reconciliation that balances) keeps its ROI; a task where checking costs as much as doing never had any.

Hole three: gains that never reach a bottleneck. An hour saved only becomes P&L when it turns into more throughput, fewer errors, or lower cost at a constraint. Scattered individual time-savings — the shadow economy's kind — evaporate into slightly longer coffee breaks unless the workflow around them is redesigned to bank the gain. That's why individuals feel rich while the company measures nothing: the value is real, unbanked, and invisible to systems that never set a baseline.

05 · BACK TO THE OPENING

So the contradiction was never a contradiction. The 95% and the shadow economy are the same fact seen from two floors of the building: individuals capture AI value task by task, informally, instantly; organisations only capture it through task selection, integration and measurement — the boring machinery most pilots skipped. The stat doesn't say AI lacks ROI. It says ROI hides in the back office, inside recurring tasks, behind a baseline someone bothered to measure — exactly where the CoReCo map has been pointing all along.

06 · TAKE THIS WITH YOU

Your rule — the demo test: before greenlighting any AI initiative, require three sentences: the metric it moves, the owner who answers for it, and the workflow step it replaces. Can't produce all three? It's a demo, not a pilot — run it as one, cheaply, and don't book its ROI. Then pick your real pilot off the CoReCo map, where the three sentences write themselves.

REFERENCES
  1. Fortune — MIT report: 95% of generative AI pilots at companies are failing (2025)
  2. Harvard Business Review — AI-Generated “Workslop” Is Destroying Productivity (2025)
  3. Stanford HAI — AI Index Report 2025 (inference cost and adoption data)