Think North Learning
thinknorth.consulting
CAPABILITY MAP Prediction 6 min

The Jagged Frontier

01 · THE SETUP

2023: Harvard researchers ran one of the largest field experiments on AI at work — 758 consultants at Boston Consulting Group, elite knowledge workers, randomly given GPT-4 or not, across 18 realistic consulting tasks.

On most tasks, the AI group was dramatically better. But the researchers had hidden a twist: one task was deliberately designed to look like the others while sitting just beyond what the model could actually do.

Before you see the numbers — commit to a prediction.

Seriously predict, don't skim. Elite consultants, a tool that just helped them shine, and one task quietly outside its reach. What happens to the AI group's accuracy on that task, compared to colleagues with no AI at all?

02 · YOUR CALL ⏸ YOUR CALL — PICK ONE TO CONTINUE

On the outside-the-frontier task, consultants using AI were…

If you pick A

That's the intuition the study demolished — and it's most people's pick. 'Expert + tool ≥ expert' feels like arithmetic. But it assumes the expert can tell when the tool is wrong, and that turned out to be exactly what fluent AI output erodes.

If you pick B

A sensible hedge. But it assumes consultants weighed the AI's contribution neutrally. They didn't — polished, confident prose got adopted, not audited. The effect had a direction, and it wasn't zero.

If you pick C — the mechanism

Correct — and worth feeling the size of that. Top-tier professionals, made substantially worse than colleagues with no AI, by a tool that had just boosted them everywhere else. The researchers called the cause 'falling asleep at the wheel': fluent output switched off their own judgment.

If you pick D

On the other 17-ish tasks you'd have been right — 12.2% more tasks done, 25% faster, 40% higher rated quality. That's what makes the twist task the important one: the same tool, the same people, and the sign of the effect flipped.

Pick one — committing first is what makes the answer stick.

the lesson continues after you choose

03 · NOT SO FAST

The natural model — the one your prediction probably used — is that AI capability is a dial: some overall level of smartness that helps more or less everywhere.

It's a reasonable model. It's also the exact model that cost the AI-equipped consultants 19 points. The tool's competence isn't a level. It's a shape — and the shape is strange.

04 · THE MECHANISM
how hard the task feels to a human → INSIDE THE FRONTIER AI strong — sometimes superhuman OUTSIDE fluent, confident — and wrong gold-medal proof ✓ a scheduling constraint ✕ draft the memo ✓ catch its subtle error ✕ the boundary is invisible from inside a fluent conversation
The frontier is jagged: two tasks of equal human difficulty can sit on opposite sides.

The study's sharpest finding wasn't the frontier — it was the humans at it. Consultants showed mis-calibrated trust: they leaned on the AI hardest exactly where it was weakest, because polish reads as competence. The best performers worked differently — the authors called them centaurs (divide the work: AI inside the frontier, human outside) and cyborgs (interleave constantly, checking at every step). Both styles share one habit: they treat the boundary as something to probe, never assume.

A 2026 caveat in both directions: the frontier has moved dramatically outward since that study — tasks that failed in 2023 are routine now. But every new model ships with a new jagged edge, in new places. The lesson was never “AI fails at task X.” It's that the boundary is invisible, moving, and unfelt from inside a fluent conversation — permanently.

05 · BACK TO THE OPENING

So the study you just predicted wasn't a verdict on whether AI helps — it produced a +40% and a −19 in the same experiment, with the same people and tool. The opening question was really this lesson's answer in disguise: the difference between those two numbers was never the model. It was whether the human knew which side of the frontier they were standing on.

06 · TAKE THIS WITH YOU

Your rule: before delegating any recurring task type to AI, map your local frontier — run five trials where you already know the right answer, and count. Above your quality bar → delegate with spot-checks. Below it → AI drafts, human decides. And re-test when models update: the frontier moves, and it doesn't send a notification.

REFERENCES
  1. Dell'Acqua et al. (2023) — Navigating the Jagged Technological Frontier (Harvard/BCG working paper)
  2. Ethan Mollick — Centaurs and Cyborgs on the Jagged Frontier