Walking the Tightrope of Experimentation Accuracy

How to avoid the trade-offs that stall impact

Sep 05, 2025

In the last edition, I introduced NAV: my framework for defining and diagnosing high-impact experimentation.

Since then, many of you have already taken the self-serve NAV audit (if you haven’t, it’s here). An emerging theme I’ve seen time and time again: Accuracy is perhaps the most common, and misunderstood, failure point.

And it’s not just a failure of omission.

Teams fall short when Accuracy is ignored AND when it’s overcorrected.

In this edition, we’ll walk through what those two extremes look like, how they impact the rest of your experimentation program, and what high-impact teams do to strike the right balance.

When Accuracy is off: The two extremes

1. Accuracy is underrated

sentry/browser size · Issue #1552 · getsentry/sentry-javascript · GitHub

You’ve seen this version before. It’s the team shipping lots of experiments with great intentions but shaky foundations:

Using low-signal behavioural metrics because they’re easy to measure
Skipping power analysis entirely (or not knowing what it is)
Peeking at results, changing variants mid-test, slicing post-hoc to find "wins"

On the surface, these teams look productive. Lots of motion, lots of charts, lots of dashboards.

But the learning? Weak. Direction? Absent.

These teams sacrifice direction for motion.

My former colleague and head of Experimentation, Lukas Vermeer, explains it perfectly in his short blog, “Speed is not velocity.”

It’s usually not the result of laziness. It’s a mix of inexperience, unclear expectations, and external pressure to deliver something. But the result is the same: erosion of trust. Teams can’t tell which results to believe, stakeholders stop listening, and soon experimentation is seen as "nice to have" instead of mission-critical.

2. Accuracy is overrated

At the other extreme are teams with gold-plated pipelines and rigorous scientific processes. The kind that won't test a copy change without a perfectly powered sample size and a six-week revenue holdout.

Sound familiar?

Waiting a month (or more) to measure long-term metrics like cancellations
Demanding overly precise revenue attribution when directional signal is what’s needed.
Blocking tests because power calculations aren’t precise enough

These teams are technically right but, in my opinion, strategically stuck.

They sacrifice speed for the illusion of precision.

It often comes from good intentions. Maybe the org overcorrected after past mistakes. Maybe leadership demanded absolute certainty. Maybe the team just loves doing science. But at some point, the pursuit of perfect undermines the ability to ship, learn, and adapt.

Instead of unblocking roadmaps or accelerating decisions, the experimentation program becomes a bottleneck. Valuable, but slow. Accurate, but irrelevant.

How Accuracy trade-offs affect NAV

When we put the above sections together, I hope it becomes clear that one of the key reasons Accuracy matters is because of how it affects the other NAV levers:

⚙️ Number

Underrated Accuracy can inflate test velocity while hiding the fact that many experiments aren’t worth running or learning from.
Overrated Accuracy reduces throughput by adding unnecessary friction and slowing test cycles.

📈 Value

Underrated Accuracy undermines trust, meaning results don’t shape decisions, and value is never realised.
Overrated Accuracy delays or discourages experimentation in key areas, missing out on potential wins and compounding insight.

If you want to scale experimentation, you can’t just optimise one lever. You need to manage the interplay.

That’s why high-impact teams don’t treat Accuracy as an absolute. They treat it as a strategic trade-off.

How to strike the right balance

The goal isn’t perfection. It’s trusted, decision-ready direction, achieved quickly enough to matter, and carefully enough to guide.

Here’s what high-impact teams tend to do:

✅ Choose smart proxies

You don’t always need to wait for long-term revenue or retention. Instead, find early indicators that strongly correlate with business outcomes (e.g., activation, engagement, funnel completion).

These signals help you move faster and stay directionally aligned.

🔧 Use tools, not rules

Power calculators (like my Experimenter’s Calculator) are a great way to make trade-offs explicit. But they’re not commandments. Use them to understand risk, not to block experimentation altogether.

💬 Build a culture of clarity

Every test doesn’t need to be flawless, but it does need to be understood.

Flag when tests are exploratory or underpowered. Be explicit about uncertainty and risks. Sacrificing precision for speed when the context demands is fine, but be open about it. Focus on decisions, not just results.

Back to my definition in NAV:

Accuracy = How trustworthy, interpretable, and aligned experiment results are.

Accuracy = How trustworthy, interpretable, and aligned results are.

Not statistically perfect and not philosophically pure. While both may be important inputs, they’re rarely achievable in practical situations. It’s about being solid, transparent, and decision-ready.

Because when Accuracy becomes a blocker, you lose momentum. But when it's neglected, you lose trust. And either way, you lose impact.

What’s next

This is the second edition of my freshly rebooted newsletter, and part of a mini-series unpacking the NAV framework.

In upcoming posts, I’ll dive into:

Why Value is the ultimate test of your experimentation program
How to use NAV results to prioritise and unlock change
Real-world case studies from teams scaling experimentation the right way

In the meantime, if you haven’t taken the NAV self-assessment yet, you can do that here. It only takes 3–5 minutes.

And if this sparked a thought, or describes something you’ve seen in your own team, hit reply. I always love hearing from fellow experimenters!

Until next time 🙌

— Simon
Helping businesses scale high-impact experimentation
drsimonj.com

📢 New: I’m opening a live cohort on Measuring Business Impact with Experimentation. If you’ve ever been asked “but what’s the revenue impact?” this is for you! Limited founding seats, join the waitlist 👉 drsimonj.com/impact-cohort.”

Other useful resources by DrSimonJ 🎉

🧮 The Experimenter’s Power Calculator to quickly plan high-quality experiments

📚The Experimenter’s Dictionary to find practical definitions and resources

🗞️ The Experimenter’s Advantage Newsletter for more of these in your inbox

✅ The NAV Benchmark survey to evaluate the impact of your program

The Experimenter’s Advantage

Discussion about this post

Ready for more?