Is A/B testing overrated? From A/B testing to Hypothesis Validation Playbook

Hypothesis validation is much more than AB tests. Learn when to use what

Published in

TDS Archive

6 min readJul 7, 2020

More and more teams rely on A/B tests to validate and quantify hypotheses. The pursuit of data-driven culture is a great thing, except when teams start overdoing it.

Truth is, A/B tests are overused and abused by some teams, becoming a substitute for common sense and clear thinking. A Google search on “Is A/B testing overrated” shows this chronic problem.

In this post, first, we’ll briefly tackle why AB tests are not a panacea for all problems. Subsequently, we’ll look at other hypothesis validation methodologies Growth PMs must draw from for robust and speedy decision making.

Why we can’t rely on only AB Tests

Low sample size: It is a limitation for not only start-ups but also bigger organizations. For a startup that has not hit the elusive 100K MAU (Monthly Active Users), it is a challenge. In a big organization, teams working on product features with low ‘coverage’ often deal with these limitations.
Short term focus and Local Optimization: We can’t run an AB test for years, so we can’t find the long term and true impact of a feature. We need other heuristics to give us confidence the impact we see in a month will hold in the long term. Moreover, some features require education, and may not show benefit in the short term, even with huge long term potential. Thus the AB test leads us to local maxima, and we may never go towards global maxima (our potential) when only the brainless AB test guides product development.
Decision Challenges and Speed: Statistically insignificant numbers are not insignificant numbers for business. A 1% increase in transactions may not be stat sig, but it could bring hundreds of millions of dollars for large companies. Well-designed experiments help; however, occasionally, we come face to face with these challenges — not stat-sig KPIs, but still a significant impact business impact. Moreover, sometimes, because of business constraints, we need to make swift decisions. For example, our feature could be a precursor to big products’ push; or, it could be just we don’t want to get caught in holiday code-freeze.

A Word on Hypothesis validation Stack

Typically, folks look at hypothesis validation stack along three dimensions:

Qualitative to Quantitative: qualitative methodologies like focus groups, interviews, and usability lab studies produce insights on ‘Why’ and ‘How’. The AB test — a quantitative method — gives us numbers on how many and how much.
Attitudinal to Behavior: more often than not, what we say and what we do is not the same. Methods like surveys and interviews get ‘words’ out of users. To understand ‘action’ we need to lean on methods like the AB test and pre-post analysis.
Context of product use. whether the product is used in a natural setting or scripted setting significantly influences user response.

AB tests are the most scientific of all methodologies. We get to know the real preference, not the stated preference. We get to know the hard data, not mushy words; moreover, we get the hard behavior from users in the context of product use, not from a scripted setting.

With so much going for the AB test, it is not hard to see why inexperienced teams rely on the AB tests more and more. They do it to avoid debate, accountability and tiring critical-thinking. Little do they realize that by choosing AB testing by default, they are slowing down the organization, and they may still get caught in decision quagmires.

We don’t draw a sword to kill a fly, similarly, we shouldn’t pick up AB testing for all idea validation needs.

The Overlooked 4th Dimension

Teams make mistakes because the above three-dimensional framework is misleading. It does talk about, Qual-Quant, Attitudinal-Behavioral, Natural-Scripted product use— but it misses the crucial 4th dimension.

The 4th dimension is our Context — company size, product size, product maturity, experimentation maturity, and business need. It gets overlooked because product management and user research literature does not talk about it.

Ironically, teams don’t account for their own context. They forget themselves, resulting in wrong choices.

High-Velocity Yet High-quality Hypothesis validation

For a mature organization with a well-developed tech stack, analytics system, and program management, erring towards conducting AB testing is not a blunder, as quality is more important than speed. Moreover, a large organization has the resources to build an experimentation platform and a system to recoup some lost speed.

For smaller companies, the focus is growth. You cannot sew your pants with a sword, so don’t rely on A/b tests. You need to talk to users more often; Product sense, competitor product teardown, and critical thinking are your tools, not the A/B test.

A/B testing trades velocity for precision. With an A/B test, we are more confident in our numbers, but it slows us down. Startups should prioritize velocity over precision, and should not run AB tests for months to obtain stat-sig. However, as a company grows, every 1% change matters, so the best practices for hypothesis validation must also change.

A/B testing best practices by company size:

Small Companies: Do not AB test. Talk to users. Product teardown of successful products.
Midsize: For changes that are expected to affect metrics, conduct A/B testing. Testing everything is not a smart idea.
Large: A/B test almost everything before the final rollout. For new product development, in some cases, behaving like a small company is a good idea. Yet, before roll-out, A/B testing is good practice because: a. accidentally dropping 1% of a metric is a big deal b. complicated technology solutions and unknown dependencies are the side-effects of being a large organization. Conservatism to minimize execution risk is a good thing.

Here is a hypothesis validation stack by company size, keeping the 4th dimension in the mind.

Before we wrap this section, we must recognize that idea validation is an exercise on decision making. Business guru Ram charan mined the nugget of the wisdom from Amazon's way of doing things. According to Charan, Bezos says that there are two types of decisions, Type 1, and Type 2.

Type 1 decisions are consequential and irreversible. Once the decision is made, and the door has closed behind you, there is no going back. Type 2 decisions are changeable and reversible; they are two-way doors. Suboptimal choices can be course correct.

It is important to bucket ideas by Type 1 and Type 2. For Type 1, make sure that the team is undertaking due diligence, collecting qualitative and quantitative data, and testing the ‘natural use of product’. But for Type 2 features, trying out multiple validation methodologies dogmatically results in analysis paralysis.

Closing thoughts

Know thyself: Be aware of your business needs & constraints. Choose a method that is optimal for the organization and problem at hand. Product management gurus don’t know our context, so don’t follow it literally.
Take Many Models approach: Try validating ideas with multiple models. Talk to customers, do AB testing and conduct a survey, if results are consistent, chances are that feature is good for the user, even when the metrics are not significant. Be a Fox, not a hedgehog.
Type 1 and Type 2 decision making: For consequential and irreversible decisions, maintain skepticism and thorough due-diligence. But for Type 2, reversible decisions, don’t get caught in analysis paralysis or the pursuit of super-scientific accuracy

Thanks, Ricky Q for reading the draft, and providing valuable feedback.

References and Further Reading:

Idea Validation much more than just AB test

When to use which user-experience research method

Please, Please, Don’t AB test that

TDS Archive

Is A/B testing overrated? From A/B testing to Hypothesis Validation Playbook

Hypothesis validation is much more than AB tests. Learn when to use what

Why we can’t rely on only AB Tests

A Word on Hypothesis validation Stack

The Overlooked 4th Dimension

High-Velocity Yet High-quality Hypothesis validation

Closing thoughts

Sign up to discover human stories that deepen your understanding of the world.

Free

Membership

Published in TDS Archive

Written by Saurabh Kumar

No responses yet