Think North Learning
thinknorth.consulting
RAG & EMBEDDINGS Real-Life Connection 6 min

The Open-Book Exam

01 · THE SETUP

A company rolls out an AI assistant for employee questions. Week one, someone asks: “How many weeks of parental leave do I get?”

The bot answers instantly, warmly, and specifically: “Twelve weeks at full pay.” The actual policy — updated last quarter — is eighteen. The bot has read, in effect, the whole public internet. It has never read the one document that mattered: the company handbook sitting in a folder fifty metres from the server room.

If you've ever asked a chatbot about your order, your contract, your policy — you've met this exact gap.

You own this rollout. The model demonstrably 'knows' a thousand times more than any employee — and none of the two hundred facts that make it useful here. How do you get your handbook into it?

02 · YOUR CALL ⏸ YOUR CALL — PICK ONE TO CONTINUE

What's the right fix?

If you pick A

The intuitive fix — put the knowledge where the knowledge lives. But training bakes facts into weights: expensive, slow, and frozen. Next quarter's policy update means training again; and the model still can't tell you where an answer came from, or keep salary bands hidden from interns. Facts that change don't belong in weights.

If you pick B

Less silly than it sounds in 2026 — context windows now swallow whole handbooks, and for small, stable document sets this genuinely works. It's the right instinct: give the model the text at answer time. It just doesn't scale — thousands of documents, per-user permissions, and you're paying to re-read the whole library on every question.

If you pick C — the mechanism

That's the production answer — retrieval-augmented generation. The model stops being a closed-book examinee and becomes an open-book one: find the right page, read it, answer from it, cite it. Fresh the moment a document changes, permission-aware, and auditable.

If you pick D

An honourable fallback, and correct if the answer machinery can't be grounded. But the failure you saw wasn't 'AI isn't ready' — it was a specific, fixable architecture gap: a brilliant reader with no access to the right book. Fix the access before you fire the reader.

Pick one — committing first is what makes the answer stick.

the lesson continues after you choose

03 · NOT SO FAST

“Train it on our data” is the phrase that launches a thousand overpriced projects. It makes sense — school taught us that knowing things means having studied them.

But it confuses two different kinds of memory. What the model learned in training is like fluency and judgment — slow to acquire, general, durable. Your parental-leave policy is a fact on a page — specific, changeable, and needing a citation. Fluency belongs in the brain. Facts belong in a library the brain can search.

04 · THE MECHANISM
“how long can I stay home with my baby?” EMBED → meaning MEANING-SPACE nearest = “parental leave” § RETRIEVED PASSAGES MODEL “Eighteen weeks at full pay, per the March update.” source: Handbook §4.2 — click to verify update the doc, next answer updates too
RAG: the question finds its pages by meaning, and the answer is grounded in what was found.

The pipeline is called retrieval-augmented generation (RAG), and its magic ingredient is how the search works. Documents are split into chunks, and each chunk is converted into an embedding — a point in a space with hundreds of dimensions, positioned by meaning. Your question gets embedded into the same space, and the system grabs the chunks sitting nearest to it. That's why asking “how long can I stay home with my baby?” finds the parental leave section despite sharing no keywords: nearby in meaning-space is the search key. The retrieved passages are then placed into the model's context, with an instruction that matters: answer from these, and cite them.

Grounding also quietly attacks the hallucination problem from the Limits shelf: a model answering from a provided passage has far less room to invent than one answering from vibes, and the citation gives you a one-click audit. The honest caveats: retrieval can fetch the wrong pages (garbage retrieved, garbage generated), chunking and permissions take real engineering, and for small stable corpora, modern long-context models make option B a legitimate rival. The architecture question is always the same trade: what does the model need to know forever, versus look up right now?

05 · BACK TO THE OPENING

So the bot that botched parental leave was never under-trained — it was under-libraried. The gap between “knows everything” and “knows your business” isn't closed by more knowing at all; it's closed by reading the right page at the right moment. You didn't need a smarter brain. You needed to turn a closed-book exam into an open-book one.

06 · TAKE THIS WITH YOU

Your rule: ask of any fact an AI system must handle — “if this changed tomorrow, would the system answer correctly tomorrow?” If no, that fact belongs in retrieval, not in weights. And when you evaluate any 'trained on your data' pitch, ask to see the citations; grounded systems can show their pages, and the ones that can't are asking for faith.

REFERENCES
  1. Lewis et al. (2020) — Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks
  2. Anthropic — Introducing Contextual Retrieval
  3. Google Cloud — What is retrieval-augmented generation?