Bayesian vs classic A/B testing when you don't have Amazon's traffic

Somewhere along the way, A/B testing got handed down to the rest of us with a rulebook written for companies that don't look anything like ours. The classic frequentist approach — pick a significance threshold, calculate a sample size, run the test untouched until you hit it — was forged inside organisations with rivers of traffic and the discipline of a monastery. Amazon can run a test on a button colour and have statistical certainty before lunch. You cannot, and pretending you can is where most testing programmes quietly go to die.

This isn't a piece about which statistical framework is philosophically superior. It's about which one survives contact with your actual traffic, your actual patience, and the actual thing you're trying to grow — leads, not clicks.

The peeking problem, and why you're already guilty

The classic method has one rule that everyone breaks: don't look at the results until the test is done. Break it, and you break the math. Evan Miller's much-cited explanation of how not to run an A/B test lays it out plainly — if you keep peeking at a running experiment and stop the moment it crosses the significance line, you dramatically inflate your false-positive rate. A test designed to be wrong five percent of the time can be wrong far more often once you let impatience call it early.

And be honest: you peek. Everyone peeks. The dashboard is right there, the variant is winning, the pressure to ship is real, and the discipline to ignore three days of green while you wait for a sample size you calculated weeks ago is superhuman. The classic method doesn't fail because it's wrong. It fails because it demands a monkish restraint that no growth team under quota actually has.

The traffic you don't have

Then there's the arithmetic of sample size. To detect a modest lift with classic significance, you need a lot of conversions — not visitors, conversions — in each variant. Miller's own sample-size reasoning makes the scale vivid: chasing a small improvement at a low base conversion rate can demand tens of thousands of visitors per variation before the test can speak with confidence. If you're a lead-gen business converting three percent of a few thousand monthly visitors, a "properly powered" classic test could run for months. By the time it concludes, your offer has changed, your season has turned, and the answer is stale.

Classic A/B testing asks you to be both a firehose and a monk. Most businesses are a garden hose with a deadline — and that's fine, if you use the right math.

Where the Bayesian approach earns its keep

The Bayesian alternative isn't magic and it isn't a loophole around statistics — it's a different, and for most of us more honest, way of asking the question. Instead of "have I crossed a fixed line that lets me reject the null hypothesis," it asks "given the data so far, what's the probability that variant B is actually better, and by how much?" As VWO explains in their write-up of Bayesian A/B testing, this framing gives you a directly useful answer at any point in the test — the probability that a variant is best, updated continuously — rather than a binary verdict you're only allowed to read once.

That reframing quietly dissolves the peeking problem. When your method is designed to update as evidence arrives, looking at it isn't cheating — it's the intended use. Modern testing engines lean on exactly this: Optimizely's Stats Engine, for instance, was built around sequential testing precisely so that teams could monitor results continuously without inflating their error rates the way naive peeking does. The tool stops punishing you for the human urge to check.

The move that matters most: promote on leads, not clicks

Here's the piece that ties this back to everything else we've been talking about. A testing framework is only as good as the thing it optimises toward, and most A/B setups are quietly optimising toward the click — the button press, the on-page event — because that's what's easy to count. But you already know clicks lie. A variant that wins more clicks and fewer qualified leads isn't a winner; it's a more attractive trap.

The right operating principle is to let the experiment judge itself on the outcome you can bank: the lead. Promote the variant that produces more of the leads your sales team would actually take, not the one that produces more twitches. Combine that with a Bayesian engine that can call a winner sooner on realistic traffic, and you get something the classic method never offered a small business — continuous, honest optimization toward the metric that pays you, without a months-long wait for a significance threshold your traffic can't reach.

Where Ad-Apt sits in this

This is the logic baked into how Ad-Apt runs its experiments. It uses Bayesian A/B testing so it can promote winners on realistic, low-traffic volumes instead of demanding Amazon-scale samples — and, critically, it promotes on leads rather than clicks, so the variant that gets rewarded is the one that fills your pipeline, not the one that merely gets pressed. You don't have to choose between statistical honesty and shipping this quarter.

If you take one thing from this: the classic A/B rulebook wasn't written for your traffic, and following it religiously has probably cost you more decisions than it's saved. Match your method to your reality — a Bayesian engine, judged on leads — and testing stops being a monastic ritual you feel guilty about breaking, and becomes something you can actually run.

The peeking problem, and why you're already guilty

The traffic you don't have

Where the Bayesian approach earns its keep

The move that matters most: promote on leads, not clicks

Where Ad-Apt sits in this

REFERENCES