Hands for the Brain
2023. You ask a chatbot: “What's on my calendar Thursday? Book us a table somewhere near the office after my last meeting.” You get a paragraph of graceful apology: it has no access to your calendar, cannot make reservations, and is, after all, just a language model.
2026. You ask an assistant the same thing. It checks the calendar, sees the 4pm running long, finds a restaurant, books 6:30, and drops the confirmation in your inbox.
Before reading on, predict: what changed? Did the models get smart enough to act — or did something else happen?
Careful — the intuitive answer ('the AI got smarter') hides an assumption. A brain in a jar that doubles in intelligence is still in a jar. What would it take to get out?
What actually let the model act on the world?
Pick one — committing first is what makes the answer stick.
the lesson continues after you choose
“The models got smarter” is true — and it's the wrong explanation. Smarter alone produces better apologies.
What it misses is that acting requires an exchange: the model must be able to reach out, and the world must be able to answer back, over and over, until the job is done. That exchange has a shape — and once you see it, you'll understand both why agents suddenly work and why they fail the specific ways they fail.
Then came a plumbing problem with big consequences. Every assistant needed a custom connector to every tool — calendars, CRMs, databases — an N×M explosion of one-off integrations. The fix was a standard: the Model Context Protocol (MCP), opened by Anthropic in November 2024 — a universal socket ('USB-C for AI') letting any assistant use any tool that speaks it. Adoption tells the story: OpenAI and Google adopted it in 2025, it was donated to the Linux Foundation's new Agentic AI Foundation that December, and by early 2026 it counted tens of thousands of tool servers and SDK downloads in the tens of millions per month. Build a tool once; every agent can pick it up.
One piece of arithmetic separates agent hype from agent engineering: errors compound around the loop. A step that's 95% reliable sounds excellent — run twenty such steps and the chance everything went right is about 36%. That's why agent quality in 2026 is less about brilliance and more about reliability: checkable intermediate results, recoverable errors, and a human gate on anything expensive to undo. And every tool an agent holds raises the stakes of the injection lesson from the Limits shelf — hands make the gullibility consequential.
So the leap from apology to booked table wasn't the brain crossing an intelligence threshold — it was the jar getting hands, and then the hands getting a standard plug. Your prediction question resolves precisely: something else happened — a loop and a protocol. Which reframes every “can AI do this?” you'll hear at work: it's really two questions — can the model reason about it, and does the loop have the right, safe tools?
Your rule — the good-delegation test: a task suits an agent when three things are true: success is checkable (you can verify the outcome cheaply), steps are reversible (or gated before the irreversible one), and the blast radius is bounded (the worst case is an annoyance, not a crisis). Booking dinner passes. Emailing your top 100 clients does not — yet.