The Lawyer Who Double-Checked
New York, 2023. A lawyer files a brief in Mata v. Avianca citing six precedents — Varghese v. China Southern Airlines among them. Proper citations, plausible judges, quotable passages.
Opposing counsel can't find the cases. Neither can the judge. Pressed, the lawyer goes back to his research tool — ChatGPT — and asks: “Is Varghese a real case?” It replies that yes, it's real, and can be found on Westlaw and LexisNexis.
None of the six cases had ever existed. The court fined the lawyers $5,000. The mystery isn't that a computer made an error — it's the shape of the error: why perfect, confident, formatted fiction instead of “I couldn't find that”?
Pause on that shape. A database returns 'no results'. A search engine returns bad links. What kind of machine returns a beautifully formatted case that doesn't exist — twice, the second time under direct questioning?
Why did the model invent case law instead of admitting ignorance?
Pick one — committing first is what makes the answer stick.
the lesson continues after you choose
The comfortable diagnosis is “the AI lied.” It fits the courtroom drama, and it suggests an easy fix — make it stop lying.
But lying requires knowing the truth and choosing against it. This machine did something stranger: it did exactly what it always does — continue text plausibly — in a situation where plausible and true had come apart. The failure wasn't moral. It was architectural. And the lawyer's second question made it worse in a way that's worth understanding precisely.
Why not just say “I don't know”? A 2025 OpenAI research paper gave the sharpest answer yet: we trained them not to. Benchmarks — and human raters — score answers right or wrong, with no credit for abstaining, so a model that guesses confidently outperforms one that hedges honestly. Like a student with no negative marking, the optimal exam strategy is to always write something. Hallucination is partly a reward-design failure, not just a knowledge gap.
The 2026 state of play: things improved — materially. Models now abstain more, browse and cite real sources, and grounded (retrieval-backed) systems can be held to their documents. But the mechanism hasn't changed, so the residual risk concentrates precisely where fluency reads as authority. The fix that works isn't a smarter model — it's changing the question: ask the system to retrieve and cite, then open the citation. Generation for thinking, retrieval for facts.
So the real error in Mata v. Avianca wasn't using AI for research — it was the double-check. The lawyer asked the plausibility machine to audit its own plausibility, and it did: “yes, it's real” was the most plausible next sentence. The mystery's answer: he never asked a second source — he asked the same generator a second question. Verification has to leave the machine.
Your rule — the load-bearing fact test: before relying on any specific fact from a model, ask “what does it cost me if this is invented?” If the cost is real, demand a source you can open — or check outside the model entirely. And never accept a model's confirmation of its own claim as verification; that's one witness testifying twice.