Why Most Companies Only Use A/B tests (And Why That’s a Problem)

Picture of Deepti Jain

Deepti Jain

Deepti is a writer and content marketer at Invesp, with over six years of experience creating data-driven content. When she’s not editing drafts, she’s probably reading about Roman history or planning her next wildlife escape.
Reading Time: 3 minutes

How A/B Testing Became the Default

A/B testing became the default method for digital teams because it’s fast, simple, and gives a clean yes/no answer. Anyone on a marketing or product team can launch a test without needing a statistician, a researcher, or a complex analysis pipeline.

Experimentation platforms like FigPii reinforced this behavior. Their workflows make it effortless to spin up a test: pick a goal, create a variant, and hit launch. That convenience shaped an industry culture in which “experimentation” means “run an A/B test,” even when other methods might yield deeper insights.

Industry surveys show this clearly. 

According to statistics, 77% of all experiments are simple A/B tests (two variants), not multivariate or multi-treatment designs. This shows how strongly teams default to the simplest possible approach, regardless of whether it’s the most informative.

Big tech helped normalize this mindset. Microsoft’s Bing team famously ran an experiment in which merging two ad title lines into a single longer headline increased click-through rate enough to generate over $100M in additional annual revenue. Successes like these made A/B testing a cultural norm. 

Today, Microsoft runs 20,000+ controlled experiments a year across Bing alone, using tests to validate everything from minor UI tweaks to major ranking updates.

The Core Problems With Over-Reliance on A/B Tests

A/B Tests Answer Narrow, Small Questions

A/B tests are great for micro-changes, like headline tweaks, button styles, and minor layout shifts. But that’s also precisely why they limit teams. They only work well for small, isolated decisions, not meaningful product, pricing, or experience shifts.

For example, A/B testing works well for small questions like: 

A/B tests are great for small questions like:

  • “Does this headline perform better than that one?”
  • “Will placing reviews higher on the page increase add-to-cart?”
  • “Does a shorter checkout form reduce drop-off on that step?”
  • “Does a different product image improve clicks?”
  • “Which CTA wording gets more taps?”

These are small questions because they focus on one element on one page, and the potential outcome is usually a tiny lift (1–2% at best).

The problem is that teams try to use A/B tests to answer big questions, the kind that decide whether the business actually grows:

  • “Is our value proposition clear enough for first-time visitors?”
  • “Is our navigation structured the way customers think?”
  • “Are we pricing and discounting in a way that improves profit, not just conversion?”
  • “Does our PDP tell a convincing story about why the product is worth the price?”
  • “Should we redesign the checkout flow entirely?”


These questions involve multiple parts of the experience interacting with one another, like pricing, messaging, navigation, and product mix. You cannot answer them by changing a single UI element.

2.2 Most Companies Lack Traffic for Statistical Power

  • Inconclusive results → false positives/negatives → wasted dev cycles.

2.3 They Tell You What Happened, Not Why

  • No insight into user motivation, UX friction, or messaging failures.

2.4 Teams Run Tests on Fundamentally Broken Experiences

  • Optimizing pages with wrong IA, poor performance, unclear value prop → local maxima trap.

2.5 They Optimize Short-Term Uplifts, Not Long-Term Metrics

  • Conversion rate goes up; LTV, retention, and profitability may go down.

What High-Maturity Teams Do Instead

4.1 Use a Broader Experimentation Toolkit

  • Sequential tests, holdouts, quasi-experiments, switchbacks, etc.

4.2 Improve Hypothesis Quality Through Research

  • Experiments are grounded in user insights, not backlog guesses.

4.3 Test Bigger Levers

  • Messaging, templates, IA, bundling, personalisation — not just buttons.

4.4 Tie Experiments to Business Outcomes

  • LTV, contribution margin, repeat rate, acquisition efficiency.

The Takeaway

FAQs about building a complete experimentation program

A/B testing vs. multi-armed bandits: which should I use and when?

How do I prove long-term impact beyond the initial lift?

What should I do when I can’t randomize users or markets?

How many experiments should we run per month at our traffic level?

Do we need multivariate testing or is A/B/n enough?

Share This Article

Join 25,000+ Marketing Professionals!

Subscribe to Invesp’s blog feed for future articles delivered to receive weekly updates by email.

Picture of Deepti Jain

Deepti Jain

Deepti is a writer and content marketer at Invesp, with over six years of experience creating data-driven content. When she’s not editing drafts, she’s probably reading about Roman history or planning her next wildlife escape.

Discover Similar Topics