Sooner or later, someone in the room draws the two-by-two. Impact on one axis, effort on the other. Four quadrants: quick wins top-left, big bets top-right, and so on. It's on every whiteboard-tool template, it's in every facilitation deck, and — this is the important part — the instinct is completely correct. Ranking work by value against cost is exactly what your AI backlog needs. The matrix isn't the problem. How teams run it is.
Because here's the thing about the impact-effort matrix: it looks so simple that everyone assumes they can't get it wrong. And then they get it wrong in the same three ways, every time.
Failure one: no shared units
Watch a typical session and you'll see people placing sticky notes at wildly different scales without noticing. One person's "high impact" means "saves me half a day a week." Another's means "feels strategic." A third is quietly rating impact by how much they would enjoy the project. The axes have labels but no units, so the same word means five different things, and the resulting map is a collage of incompatible judgments dressed up as a shared picture.
A good session forces a common currency before anything gets placed. For an AI backlog the natural units are concrete: impact is time saved — hours per week, times the number of people it affects. Effort is time-to-build — is this a prompt written this afternoon, or a multi-week integration? When everyone estimates in the same units, the map stops being an opinion collage and starts being comparable.
Failure two: solo estimates masquerading as consensus
The second failure is subtler and more damaging. In many sessions, one person — the facilitator, the loudest voice, the sponsor — places most of the notes while everyone watches. Or each idea is scored by whoever proposed it, which is like letting students grade their own exams. Either way you've smuggled the popularity contest back in through the side door. The map looks collaborative; the estimates are individual.
A matrix filled in by one confident person isn't a prioritization. It's that person's opinion, drawn on a grid so it looks like math.
The fix is structural, and product-discovery practitioners have argued it for years: get the estimates from the people who actually do each workflow, and get them independently before the group converges. Discovery coaches like Teresa Torres have built entire practices on the principle that the team closest to the work should be continuously involved in weighing options, not handed a finished decision. The person who lives the Monday-reconciliation problem is the only reliable estimator of what automating it would save — and they're often not the person holding the marker.
Failure three: a scatter of dots and no story
The third failure comes at the end. You've placed thirty notes, the board is a cloud of dots, everyone nods, and then… nothing. A scatter of individual ideas is not a plan. It doesn't tell you where to start, and it hides the fact that half your dots are secretly the same bet wearing different labels — five people independently described versions of "let AI draft our first-response emails," and on an unclustered board they read as five separate items competing for five separate slots.
What turns a scatter into a decision is clustering: grouping related ideas so the portfolio structure becomes visible. Quick, high-value, low-effort clusters are where you start. High-value, high-effort clusters are the deliberate bets you resource on purpose. The near-worthless corner you name and drop. Without that grouping step, the exercise produces a photograph of the discussion instead of a direction.
What a good session actually requires
Put the three fixes together and you can write the spec for a session that works:
- Shared units, enforced up front — everyone estimates impact and effort in the same concrete terms, so the placements are comparable rather than poetic.
- Independent estimates from the doers — the people who own each workflow weigh it, before the group anchors on one loud voice.
- Clustering into a portfolio — related ideas grouped so the board yields a where-to-start, not a where-we-argued.
Notice what this rules out. It rules out the solo doc from post one. It rules out the free-for-all meeting from post two. And it rules out the pretty-but-inert matrix that most templates leave you with. The impact-effort instinct is right; it just needs a session designed so the units are shared, the estimates are honest, and the output is a portfolio instead of a cloud. In the next post we'll put the actual options side by side — the whiteboard templates, the scoring frameworks, the discovery consultants — and be honest about where each one fits and what it really costs.