If visitors are not converting on your website, then obviously, there is a problem that is stopping them.
You can go ahead and ask your design team to create new designs, but the question remains: how do you know that the new designs will convert more visitors compared to the original design?
That is where AB testing comes in handy.
A/B testing (sometimes referred to as split testing) is the process of testing multiple new designs of a webpage against the original design of that page with the goal of determining which design generates more conversions.
The original design of a page is usually referred to as the control. The new designs of the page are usually referred to as the “variations”, “challengers” or “recipes.”
The process of testing which page design generates more conversions is typically referred to as a “test” or an “experiment.”
A “conversion” will vary based on your website and the page you are testing. For an e-commerce website, a conversion could be a visitor placing an order. For a SaaS website, a conversion could be a visitor subscribing to the service. For a lead generation website, a conversion could be a visitor filling out a contact form.
1st Example: The homepage on an e-commerce website receives 100,000 visitors a month.
To determine if there is a way to increase conversions, the design team creates one new design for the homepage.
AB testing software is then used to randomly split the homepage visitors between the control and the new challenger. So, 50,000 visitors are directed to the control and 50,000 visitors are directed to the challenger. Since we are testing which design generates more orders (conversions), then we use the AB testing software to track the number of conversions each design generates. The A/B testing software will then determine the winning design based on the number of conversions.
2nd Example: The homepage for a blog receives 3,000 visitors a month.
The primary conversion goal for the homepage is to get a visitor to subscribe to the email list of the blog. The designer creates a new design for the blog homepage which highlights the subscription box.
The split testing software is used to send 1,500 visitors to the original page design (control), and the testing software sends 1,500 visitors to the new design (challenger). The testing software tracks the number of subscribers (conversions) each design generates.
A 2015 survey by E-consultancy showed that 58% of its respondents are conducting A/B testing:
But how successful is AB testing in helping companies increase their conversion rates?
A 2017 survey by Optimizely shows that only 25% of all A/B tests produce significantly positive results. Visual Website Optimizer reports that only 12% of all A/B test produce significantly positive results. Finally, data from Google shows that only 10% of all A/B test produce significantly positive results.
Case Study: what did Netflix learn from A/B testing?
46% of surveyed Netflix visitors complained that the website does not allow them to view movie titles before signing up for the service. So, Netflix decided to run an A/B test on their registration process to see if a redesigned registration process will help increase subscriptions.
Creating an AB test
The new design displayed movie titles to visitors prior to registration. The Netflix team wanted to find out if the new design with movie titles would generate more registrations compared to the original design without the titles. This was analyzed by running an A/B test between the new designs against the original design.
The test hypothesis was straightforward: Allowing visitors to view available movie titles before registering will increase the number of new signups.
In the split test, the team introduced five different challengers against the original design. The team then ran the test to see the impact. What were the results?
Results of the AB test and analysis
The original design consistently beat all challengers.
The real analysis happens after you conclude an A/B test. Why did the original design beat all new designs although 46% of visitors said that seeing what titles Netflix carries will persuade them to sign up for the service?
The team at Netflix gave three different reasons of why the original design beat all the challengers:
- Netflix is all about the experience: the more users interacted with the website, the more they love the experience. So, Netflix is more than just browsing.
- Simplify choice: the original design (the control) showed users one option: sign up for the service. The new designs offered visitors multiple options (multiple movies). This complicated the choice which visitors had to make. More choices resulted in fewer conversions.
- Users do not always know what they want: The Netflix team argued that test results point to the fact that users do not always know what they want.
While these might be valid explanations, especially the second point, we would argue that there is another reason altogether.
Could it be that visitors finally see all the movie options which Netflix offers and they do not find the movie selection convincing, so they decide to walk away? If that is the case, is the problem with the new designs or is it a problem in the movie selection which the site offers?
How does the A/B testing software determine the winning design?
At its core, AB testing software tracks the number of visitors coming to each design in an experiment and the number of conversions each design generates. Sophisticated A/B testing software tracks much more data for each variation. As an example, FigPii tracks:
- Page views
- Revenue per visit
- Bounce rate
- Source of traffic
- Medium of traffic
The split testing software uses different statistical modes to determine a winner in a test. The two popular methods for determining a winner are Frequentist and Bayesian models.
The split testing software tracks conversion rates for each design. However, declaring a winner in a split test requires more than generating a small increase in conversion rates compared to the control.
The Frequentist model uses two main factors to determine the winning design:
- The conversion rate for each design: this number is determined by dividing the number of conversions for a design by the unique visitors for that design.
- The confidence level for each design: a statistical term indicating the certainty that your test that will produce the same result if the same experiment is conducted across many separate data sets in different experiments.
Think of confidence level as the probability of having a result. So, if a challenger produces a 20% increase in conversions with a 95% confidence, then you assume that you have an excellent probability of getting the same result when selecting that challenger as your default design. It also indicates that you have a 5% chance that your test results were due to random chance, and a 5% possibility that you found a wrong winner.
The Bayesian model uses two main factors to determine the winning design:
- The conversion rate for each design: as defined above.
- Historical performance: the success rate of previously ran A/B experiments ran on the webpage.
Leonid Pekelis, Optimizely’s first in-house statistician, explains this by saying
Bayesian statistics take a more bottom-up approach to data analysis. This means that past knowledge of similar experiments is encoded into a statistical device known as a prior, and this prior is combined with current experiment data to make a conclusion on the test at hand.
We typically rely on multiple metrics when determining a winning design for a test. Most of our e-commerce clients use a combination of conversion rates and revenue per visit to determine a final winner in an experiment.
Selecting which metrics will depend on your specific situation. However, it is crucial to choose metrics that have an impact on your bottom line. Optimizing for a lower bounce or exit rates will have little direct and measurable dollar value to most businesses.
The team at Bing was trying to find a way to increase the revenue which the site generates from ads. To do so, they introduced a new design that emphasized how search ads are displayed. The team tested the new design vs. the old design. The split test results showed a 30% increase in revenue per visit.
This, however, was due to a bug in their main search results algorithm in the new design. This bug showed visitors poor search results. And as a result, visitors were frustrated and were clicking on ads.
While the new design generated a higher revenue per visit, this was not a good long-term strategy. The team decided to stick to the old design instead.
Assigning weighted traffic to different variations
Most AB testing software automatically divides visitors equally between different variations.
There are however instances where you need to assign different weights to different variations.
For example, let’s take an experiment that has an original design and two challengers in it. The testing team might want to assign 50% of the visitors to the original design and split the remaining 50% between variations one and two.
Should you run AB testing on 100% of your visitors?
Some Conversion optimization experts debate this question at great lengths.
Looking at your analytics, you can typically notice that different visitor segments interact differently with your website. Returning visitors (those who visited site previously) generally are more engaged with the website compared to new visitors.
When launching a new AB test, you will notice that in many instances:
- New visitors react in a better way with your experiment challengers.
- Returning visitors, who are used to your current design, react negatively to your new designs.
The fact that new visitors convert at higher rates with new designs compared to returning visitors is attributed to the theory of momentum behavior.
If you your website gets a large number of visitors, we recommend that you launch new tests for only new visitors and observing how they react to it. After that, you can start the test for returning visitors and compare their reactions to the new designs introduced in the experiment.
Holdback split tests
We typically recommend running holdback split tests for larger websites that receive thousands of conversions per month. In these types of tests, you launch the tests to a small percentage of your site visitors. For example, you start with launching the test to 10% of your visitors. If the results are encouraging, then you expand the test to 25%, 50%, and 100% of your website visitors.
There are several advantages to running hold back A/B tests:
- Discover any testing bugs: As you launch an AB test, your designs might have bugs in them. By running the test on a small percentage of your visitors, only that tiny segment of the visitors will see the errors in the new designs. That will give you the opportunity to fix these bugs before rolling out the test to 100% of your visitors.
- Reduce revenue risk: by running the test on a small percentage of visitors, you reduce the risk of having one of yours test variation causing a significant drop in revenue.
If you choose to run hold back A/B tests, make sure that you start a new test each time you change the traffic allocation going through the experiment to avoid any statistical problems with results.
How many variations should you include in a test?
There is a lot of math that goes into determining how many variations should be included in an A/B test. The following are general guidelines you can apply, however, more details will be covered in a later section:
Calculate the monthly number of conversions generated by the particular page you plan to test:
- on the conservative side, divide the total monthly conversions generated by the page by 500 and subtract one
- on the aggressive side, divide the total monthly conversions generated by the page by 200 and subtract one
If you have less than 200 conversions a month, your website is not ready for A/B testing. Focus on driving more visitors to your website.
Example: Your website generates 1,000 conversions per month:
- On the conservative side, an A/B test can include one challenger against the original (1000/ 500 – 1)
- On the aggressive side, an A/B test can include four challengers against the original (1000/ 200 – 1)
Again, this is a simplification of the calculation, but it will give you a good starting point.