If you started a literature review in the last few years, you probably reached for a graph. Connected Papers takes a seed paper and lays out its neighbourhood by co-citation and bibliographic coupling, so related-but-not-directly-linked work surfaces as a visual constellation. Research Rabbit lets you grow a collection outward along citation edges and alerts you as it expands. Elicit reads across many papers at once and extracts structured answers to a question. scite tells you whether a paper was cited supportingly or with dispute. These are real advances, and if you are not using something in this family you are working harder than you need to.
But it is worth being precise about what each of them maps, because the category quietly shapes what you can and cannot conclude from it.
What a relatedness map is, and what it is for
Connected Papers and Research Rabbit are, at their core, maps of papers. Their unit is the document; their edges are relationships between documents — citation, co-citation, textual similarity. This is exactly right for the job they are built for: discovery. Point them at a seed and they answer "what else lives near this?" — the foundational paper you missed, the parallel line of work in an adjacent field, the recent extension you hadn't seen. For orienting in an unfamiliar area, nothing beats it.
What the map does not, and cannot, tell you is whether the neighbourhood is sound. Two papers sitting close together on a co-citation graph are close because the literature treats them as related — which, as we saw two parts ago, is the same crowd behaviour that produces citation amplification and unfounded authority. A dense, well-connected cluster on the map can be a mature, well-tested consensus or a tightly-wound ball of papers citing each other into a fact that no experiment established. The graph draws both the same way, because the graph is measuring proximity, not truth.
A map of papers shows you where the literature has clustered its attention. It draws a rumour and a result with the same confident line, because relatedness is blind to whether the thing everyone is standing near is actually there.
The question the document unit can't reach
Here is the limit, stated plainly. The unit of these tools is the paper. But the thing you actually need to evaluate is smaller than a paper and lives inside it: the claim, and the argument that is supposed to carry it. A paper is not one assertion; it is a stack of them — a background claim, a mechanism claim, a headline result, a boundary condition, a limitation half-admitted in the discussion. Some of those claims are rock-solid and some are the weak joint the whole thing hangs from. A document-level map cannot resolve below the document, so it averages the strong and the weak into a single dot and a set of edges.
What changes when you make the argument the unit instead? You stop asking "what papers are near this paper?" and start asking the questions that actually decide whether you can build on it:
- What are the premises, and what is the conclusion? Decompose the paper into its argumentative structure rather than treating it as an opaque block — the move Stephen Toulmin formalised decades ago and that argument-mapping research has shown measurably improves how well people evaluate reasoning.
- Is each premise supported by data, by a citation, or by rhetoric? Rate the claims individually, so the weak joint is visible instead of hidden in an average.
- Where a premise leans on a citation, does the chain reach evidence? Audit the cited support — grounded, dead-ended, or circular — rather than counting it.
- Does the conclusion actually follow, once the weak premises are discounted? This is the question a relatedness map structurally cannot pose, because it never opened the paper.
Why "read across many papers" isn't the same move either
Tools like Elicit go further than the graph tools — they read into papers and extract structured fields, which is genuinely closer to argument than to relatedness. But extraction summarises what a paper says; it is not the same as auditing whether what it says holds. A summary of a flawed argument is a tidy flawed argument. The critical step — decomposing the reasoning, rating each claim's support, following each citation to ground or to nothing, and asking whether the conclusion survives — is an evaluative act, not an extractive one. It is the difference between a very good research assistant who tells you what forty papers claim and a sceptical senior colleague who tells you which two of the forty would survive a hostile seminar.
This matters most exactly where it is hardest to see. Ioannidis's argument that most published findings may be false was not an argument about which papers exist or how they cluster — it was an argument about the internal statistical and structural properties of the claims themselves. No map of papers surfaces that. Only a read of the argument does.
The comparison, honestly stated
None of this makes the relatedness maps wrong. They are excellent at discovery, and discovery is a real and necessary phase. The point is narrower and more useful: they are answering "what is near this?" when your next experiment depends on "is this sound?" — and no amount of better mapping of the first question turns it into the second. To evaluate an argument you have to represent the argument, which means going below the document, decomposing premises and conclusions, and rating and auditing what you find. The final part is about the payoff of doing exactly that: what becomes possible when you can see a literature's arguments clearly enough to see where they stop — because that edge, the gap, is where your next hypothesis is already waiting.