ADAPT, DON'T TRAIN Dilemma & Decision 7 min

Borrowed Brains

01 · THE SETUP

A mid-size legal-services firm wants “our own AI, trained on our 40,000 contracts.”

Vendor A quotes a custom-trained model: six months, a serious invoice, “your data becomes your moat.”

Vendor B says something that sounds like a shrug: “We won't train anything. We'll use the same frontier model everyone can rent, and just… hand it the right contracts at the right moment.”

In the bake-off, Vendor B's untrained system answers better, cites its sources, and ships in three weeks. The managing partner is confused, and a little offended: how can no training beat training?

Put yourself in the room. You have a real budget and 90 days. What are you actually paying for when you 'train an AI on your data'?

02 · YOUR CALL ⏸ YOUR CALL — PICK ONE TO CONTINUE

Your call — $50k, 90 days, 40,000 contracts. Where do you put the money?

If you pick A

The instinct for ownership is sound, but the arithmetic isn't close: frontier-scale pretraining runs to hundreds of millions of dollars in compute, data and talent. Even most Fortune 500s don't pretrain. You'd own a model — a much worse one than you can rent for pennies per query.

If you pick B

This feels right — it's our data, make the model absorb it. But fine-tuning mostly changes a model's behaviour (tone, format, skill at a narrow task), not its knowledge. Facts baked into weights go stale the day a contract is amended, can't be permissioned per client, and can't cite a source. There's a better home for knowledge.

If you pick C — the mechanism

That's Vendor B's play, and in 2026 it's the right default. The model brings the reasoning and the language; retrieval brings your facts, fresh and citable; prompting steers the behaviour. You ship in weeks, and when a better model launches you swap it in and inherit the upgrade free.

If you pick D

Half-wise: model-level advantages do evaporate fast. But the assets that compound — your cleaned document base, your evaluation set, your workflow integration — are model-agnostic and only grow in value. Waiting forfeits the learning without avoiding the cost.

Pick one — committing first is what makes the answer stick.

the lesson continues after you choose

03 · NOT SO FAST

The obvious answer is B — fine-tune on our data. It made sense in 2023, when context windows were tiny, frontier APIs were pricey, and 'domain AI' meant baking your domain into the weights.

But it misses what fine-tuning actually changes — and what happened to the economics since. Three numbers moved, and together they flipped the default.

04 · THE MECHANISM

Start with the deep idea, which the original DemystifAI book taught with a guitarist: a guitar player picks up the ukulele far faster than a total beginner, because most of the skill transfers. That's transfer learning, and it's the founding bargain of modern AI: models arrive pre-trained on trillions of words at someone else's expense, and your job is only ever the last mile — adapting a borrowed brain to your task.

The adaptation ladder: climb only as high as the problem forces you.

The ladder, bottom to top. Prompting: instructions and examples in the request — free, instant, surprisingly far-reaching; always exhaust this first. Retrieval (RAG): search your documents and hand the model the relevant passages per question — this is where knowledge belongs, because it stays fresh, respects permissions and can cite sources. Fine-tuning: further training on your examples — this is where behaviour belongs: house style, strict output formats, a narrow task done millions of times, or distilling a big model's skill into a small cheap one. Pretraining: building the brain itself — a handful of labs on Earth.

THE PRINCIPLE

Knowledge in retrieval, behaviour in weights, steering in prompts

It means: Match the adaptation to the gap: facts that change belong in retrieval; consistent behaviours belong in fine-tuning; everything else is a prompt.
It works through: Facts via RAG stay current and auditable → behaviours via fine-tuning become reliable and cheap at scale → prompts steer both without engineering. Mixing these up is the classic expensive mistake: fine-tuning facts (stale, uncitable) or prompting behaviours at scale (inconsistent, token-hungry).
Spot it when: If someone proposes training whenever AI 'needs to know our stuff', they're solving a retrieval problem with a weights budget.

Why 2023 advice aged badly, in three moves: context windows grew from ~4,000 tokens to hundreds of thousands or more — you can now hand a model entire contracts, not snippets. Inference prices collapsed — Stanford's AI Index measured a ~280× fall in the cost of GPT-3.5-level performance in two years. And frontier models got so capable that a rented brain plus your documents beats a custom-trained lesser brain on most business tasks. Fine-tuning didn't die — it moved to a specialist role: form, not facts.

05 · BACK TO THE OPENING

So Vendor B wasn't cutting corners, and the managing partner's offence was aimed at the wrong target. “No training” actually meant: borrow a brain that cost someone else nine figures, and spend your budget on the one thing they can't sell anyone else — getting your documents to the model at the right moment. The training was already done. The moat was never the model; it's the data, the evals, and the workflow.

06 · TAKE THIS WITH YOU

Your rule: name the gap before you spend. Ask “is the model missing knowledge, missing behaviour, or just missing instructions?” Knowledge → retrieval. Behaviour, needed identically at scale → fine-tune. Instructions → prompt, and prototype it this afternoon. Climb the ladder only when the rung below demonstrably fails.

REFERENCES