How Clean are Your A/B Tests? A Testing Hygiene Checkup

A/B Testing Clean
The Most Interesting Man In The World, tests clean.
I am going to use some advice animal memes (basically a bunch of awesome animal images with funny captions) to illustrate the rules you should follow when creating and running your A/B tests. If you are running tests that are clean then the only question that you have to answer is ‘was your hypothesis right?‘.

There MUST be a hypothesis for your test. If you don’t know what you are testing then collecting data is pretty pointless. To call it a test there needs to be something that is being tested, not just throwing spaghetti against the wall. Sometimes a radical redesign is the right call, but most of the time you should be testing something specific. Things like hero shots, calls to action, and headline are all things that you should be testing.

For an A/B test to be meaningful and insightful you should follow these rules to keep your insights and data clean:

  1. The original cannot have content edits during the test
  2. All variations should have one hypothesis to test against the Original
  3. Traffic distributions should stay proportional
  4. Don’t end tests on less than 50% confidence
  5. Define a minimum sample size

The original CANNOT have content edits during the test

A/B Testing Variations
Paranoid Parrot
The original is your control group. It is the thing that you are testing against.

If you change your original in a way that affects your hypothesis then you have to start your data over from scratch.

If you are updating something like a broken link or a typo obviously don’t worry, but if you are changing the headline you should be be making a variation, not changing the original.

If you change the original your data will no longer be representative of your landing page.


All variations should have ONE hypothesis to test against the Original

A/B Testing Variations
All the things!
Not all variations need to have the same hypothesis, but each variation should be easily, and clearly, differentiated.

You can have one variation that tests headline, one variation that tests CTA, and one variation that tests form length–but you shouldn’t have one variation that tests all three at once.

Testing Tip: Remember to name your variations based on what they are testing.

Traffic distributions should stay proportional

AB Testing Traffic
OCD Otter
Different variations can receive different amounts of traffic, but try to minimize any change in the mix.

It is not horrible, but it does complicate any insights you may have.

If you can, introduce the new hypothesis in the next generation instead of adding new variations to an ongoing test.

Examples of a Bad Changes
Control (80%), Test A (20%) »» Control (50%), Test A (50%)
Control (80%), Test A (20%) »» Control (50%), Test A (25%), Test B (25%)

These are the two most common bad changes. Either drastically changing the mix for an actual A/B test or adding a new variant in a way that changes the proportional mix.

Better Changes
Control (80%), Test A (10%), Test B (10%) »» Control (60%), Test A (20%), Test B(20%)
Control (80%), Test A (20%), »» Control (60%), Test A (20%), Test B(20%)

It is better to split the tested traffic across the tested variations evenly.

Notice that I say better change, not good change, changing your traffic mix always adds some noise to your data.

If you can save the new hypothesis for the next test you should.

Don’t end tests on less than 50% confidence

A/B Testing Confidence
What can I say, I love OCD Otter
If you are going to let a coin flip decide, you should save time by just flipping a coin.

I know that many people are having an aneurysm even thinking about a test that isn’t being taken to 95% confidence.

Here is what I have to say: I’m not a scientist, I’m a marketer.

I don’t calculate standard deviation; I calculate return on investment, and that means moving on to the next test that improves my results and income instead of getting into the mathematical weeds on any given scenario. #Amen (Editors note).

Define a minimum sample size

A/B Testing Sample Size
Confidence Wolf
Listen, if you aren’t scared by the data you see you are probably going in the wrong direction. Choose a minimum sample size and let that guide you.

The simplest rule is choose a minimum sample that lets all versions get 100 visitors. Why 100 you may ask? Because, then all of your percentages are easy to talk about and understand. If you don’t have what you need you can keep going, but don’t cut and run from what could be a solid improvement.

In Conclusion: Now You Know (and Knowing is Half The Battle)

A/B Testing Success Baby
Success Baby
The rules above can’t guarantee success, but if you follow them you will have data that you can count on.

That means that if you have to defend your decisions or hand-off control you will have a clear demonstration of your success.

So how’s your testing hygiene?
Tell me about some of your A/B testing successes and failures. Also, if you have some questions, comment!

— Carlos del Rio

About Carlos Del Rio
Carlos is the former Director of Conversion Analysis & Digital Strategy at Unbounce. He has contributed to top marketing blog SEOmoz and has spoken at major events on both SEO and Conversion Rate Optimization. Carlos is the co-author of User Driven Change: Give Them What They Want and A Strategic Framework for Emerging Media.
» More blog posts by Carlos Del Rio


  1. Affiliate and Online Marketing News You Can Use – May 2012 |

    […] Unbounce explains how to run a clean A/B split test in meme form. […]

  2. rambo

    I think it is necessary…

  3. Michael Hayes

    “All variations should have ONE hypothesis to test against the Original”

    Why? If I have three hypotheses I’d like to test, and my traffic isn’t very high, surely it’s better to make a single variation with all the changes.

    If any one of them make a significant difference, it’s very unlikely the other two would cancel this out, and you iterate to a better page more quickly. If the test page is significantly worse, you know at least one of the factors made a big difference, you know you’re testing important factors and can keep trying variations on the main page.

    This reduces the time spent running tests which have no real impact on your conversion rate.

    • Carlos Del Rio

      Making a single variation testing 3 hypotheses is essentially a Radical test. The draw back is that you either need additional data (like heat map, or click event) or to make an assumption about the cumulative effects. It is actually quite common to have factors that compound each other.

      The problem with any test that is built on assumption is that they are harder to pass on or between testers without significant documentation of opinion. Testing one hypothesis means that anyone can see both the idea and the results for themselves. That will mean that negative hypotheses will be dropped and positive hypotheses will be extended.

  4. A Day in The Life of A Conversion Rate Optimization Gunslinger | Complete SEO Marketing News

    […] want clean, useful data so I split my traffic 40% Control 15% for the four Variations and set my minimum sample as 700 […]

  5. amar


    I have two landing pages in the AB Experiment under Google analytics. Under the advanced option in the settings, I have also selected the option of equally distributing the traffic to both the landing/ testing pages. Here is my question:

    Suppose, at the same time, I am running Google ads & other social media ads, And all of the ads have the landing. final URL to one of the pages (the original page) in the AB experiment. I want to know that is google going to split the traffic generated from different sources like Adwords and social media as well to the two different landing pages which are part of the AB testing?