When we talk about Conversion Rate Optimization, it’s nearly impossible not to mention A/B testing (or split testing). Actually, many companies think that AB testing and CRO are completely synonymous. But that’s not true. A/B testing is a part of the greater umbrella of CRO – but that’s a topic for another day.
From Saas to e-commerce to lead generation websites, many companies now understand how the targeted audience responds to certain changes on their websites, thanks to A/B testing.
Most of the website elements you see on popular sites such as Google, eBay, and Amazon were evaluated for effectiveness using A/B testing. When it comes to positioning website elements, the design strategy that worked for one company may not necessarily work for yours. That’s why you should run an A/B test.
Many people think that A/B testing is all about selecting items to test, setting the goal of the test, paying close attention to changes in user behavior, checking for conversions, checking for the significance level, and determining the winner.
But is it that simple? (We wish).
A/B test results can be complex to analyze. Even after creating a strong testing hypothesis, it only takes one simple mistake during the analysis process to derail your whole efforts and make you come up with conclusions that can cost you valuable leads and conversions.
But since you are already here, we will walk you through the process of analyzing results for an A/B test. All the tips we give in this article can be applied to any A/B testing tool –but we recommend that you try out the tool made by marketers, for marketers, FigPii.
Defining A/B Testing
A/B testing, also known as split testing, is comparing two different versions of a web page or email to determine which version generates more conversions. According to our State of AB testing report, 71% of online companies run two or more A/B tests every month. For many CRO Agencies, A/B testing is a decision-making tool that helps reveal the elements that have the highest impact on the overall conversion rate on a site. Simply put, split testing gives empirical validation to your design decisions. The potential benefits of using A/B testing are:- Improved content
- Higher page engagement
- Higher conversion rates
- Reduced bounce rates on pages
How To Analyze A/B Test Results
Once you’re satisfied that your test has gathered enough data, reached the required statistical significance level, and run long enough, it’s time to begin the analysis process. The variation(s) you were testing will either win or lose, or the results will be inconclusive. Regardless of the outcome, you should focus on the learnings as you will need those to inform your next tests. One thing that you should know is that, in some instances, losing tests will give you more insights than those tests where the original underperforms. With that said, let’s take a look at how you should analyze your A/B test results: Winning variations – what’s next? Congratulations, your test won! So, what’s next? Removing the old design and asking your developers to implement the winning variation permanently? No, not yet! Before you do that, you have to ensure that your results are correct. This means you must investigate and know the factors contributing to the win. Remember, A/B testing is not all about running tests and hoping for wins. It’s also about learning.One winning variation
Most Optimizers fail to understand the importance of validating results and are instead obsessed with reaching the statistical significance of the test and implementation. This ends up being significance-level testing. So, before you ask your developer to implement the winning variations on the whole site, first determine whether the test results are valid. For instance, let’s say you were testing the control against three variations (V1, V2, and V3), and V2 won. The next thing you should do is re-run the test; this time, you should only test the control vs. the winning variation (V2, in this case). If the initial results are correct, V2 will win again, and you will be able to draw some learnings that you can propagate across the site. The other thing you should consider doing after having a winning variation is to allocate 100% of the traffic to the winning variation. This means pausing the experiment, duplicating it, and resetting the traffic allocation.Multiple winning variations
Sometimes, depending on how good your hypothesis was, a single test can have multiple winners. Both V1, V2, and V3 can outperform your original and have an uplift (in terms of your test goals). As good as it might sound, it can be confusing –in the sense that you might not know which variation to go with.
A/B Test with more than one winner
Looking at the above screenshot, In such cases, it’s easy to go with variation four because it is the highest winning variation. But is ignoring other winning variations (variations 1 and 3) a good idea?
This is very subjective – some CROs will choose to ignore it, while others would recommend a further investigation.
But I’d recommend that you segment your results just to see how your most valuable customers respond to the changes. Your test data can be segmented in different ways, such as:
- Traffic source
- Visitor type (new vs. returning)
- Browser type
- Device type (it’s recommended that you test mobile and desktop devices separately so as to see which one performs better than the other)
Losing variations – what’s next?
Yes, it’s frustrating sometimes, and some Optimizers can’t handle a losing variation, so they tend to ignore losing tests. But that’s the wrong way of going about it –embrace losing variations. You can actually get valuable insights from losing tests. An A/B test is said to have failed when the variation(s) running against the control fails to beat the control design in terms of the primary goals and other goals that are set in the test. A good example of this is when the control/original version gets more conversion uplifts than the variation(s). This can happen even if you follow all the A/B testing best practices and correctly run the test. But depending on how you look at things, there is always a good side to everything. In the context of A/B testing, losing variations is not bad –they present a goldmine of information you can use to hone in on expectations that your website is not meeting, focus your testing, and make improvements that will guarantee long-term success. In simpler terms, losing variations are just as actionable as winning variations. When your test loses, you should:- Evaluate the solutions you had in your variations.
- Go through your hypothesis.
- Revalidate your research data.
Reevaluating the solutions in the variations
The reality is that the most likely element you’ve got wrong is the solution you presented. Solutions can be a bit subjective in that, based on the covered problem, you’ve removed, replaced, or redesigned an element or a flow on a site. But there could be multiple variables to the change: the location, the copy, the look and feel of it, the UX of it, etc. First and foremost, the vast majority of tests run at Invesp are evaluated from a solutions standpoint. The reason for this is typically the problem uncovered, and the research conducted was thorough. The hypothesis is highly based and driven by data. The solution is the part that can be a bit more prone to human assumptions. Remember: a single hypothesis can have multiple solutions. Very often, the logic behind a solution seemed sound during design discussions, but it did not bode well with the site visitor. Going back to the drawing board and thinking about discarded solutions might be a good approach to making a losing test a winner. For instance, let’s say a hypothesis has four possible solutions:- Change the placement of the form from below to above the fold
- Use videos instead of text
- Multi-step form instead of a single form
- Use a short copy instead of a long one
Go Through Your Hypothesis
When the A/B test results are exactly the opposite of what you expected, there is a high chance that your hypothesis is wrong. But before we get into that, what’s a hypothesis? The dictionary definition of a hypothesis is:A tentative assumption made in order to draw out and test its logical or empirical consequences.In Conversion Rate Optimization, a hypothesis is a prediction you create prior to running a split test. A good hypothesis reveals what will be changed and how the changes will increase the conversion rate. Through A/B testing, a hypothesis can be proved or disproved. If you run a split test and your variation(s) fails to beat the original, this can be a confirmation that your hypothesis or prediction is wrong. This is usually the second line of defense that follows after you’ve changed solutions, but still no uplift. You may have uncovered the right data during your research, but your prediction after reading the data may not be correct. Sometimes, the data uncovered could have multiple predictions about why visitors behaved in a certain way. For example, after analyzing session replay videos or heat maps, you may notice that visitors are not clicking your CTA buttons. Based on this analysis, your hypothesis can state that increasing the size of the CTA button will make it more visible, and this will increase the click rate. However, this can be a wrong prediction because people are not clicking the CTA button because of the placement of the button or because the copy is not compelling enough. On the other hand, tests fail not because you had a wrong hypothesis but because you didn’t base your variations on the hypothesis. Hoping to increase conversions by testing random ideas wastes time, money, and web traffic. You need to do proper qualitative and quantitative research, come up with a proper hypothesis, and run a test based on your hypothesis.
Revalidate Your Research Data
In every CRO project, Optimizers use two types of data: qualitative and quantitative. Before an A/B test is launched, all the types of data are to be validated. This validation process is a bit tricky, but it’s not impossible to understand. Your qualitative data is validated using a quantitative research technique or vice versa. To make you understand, let’s say Google Analytics (a quantitative research technique) shows you that there is a high bounce rate on page XYZ. Then, you will also have to watch session replay videos on the same page so as to understand what can be causing videos to leave. The data revalidation process can be undertaken in qualitative-first or quantitative-first. Qualitative first approach: this approach entails understanding how your users engage with your site, and you later prove or disprove your findings with quantitative data. If your session replays indicate that users are hesitant to click on your CTA button, you can validate that by seeing how many people click on the button. Quantitative first approach: The quantitative first approach is in stark contrast with the qualitative one. Most Optimizers usually prefer this approach as it answers the ‘what’ questions. When they have answers to the questions at their fingertips, they then seek to understand the ‘why’ by analyzing the qualitative data they could have obtained using user tests, heat maps, polls, etc. But in most cases, optimizers prefer using the quantitative-first approach as they seek to understand what before they get to the why. The point here is when your A/B test fails, you have to revalidate your research data. It may be a case of weak data or not very conclusive data if you had taken the quantitative first approach, this time around using the qualitative first approach. However, it will be much better to undertake both approaches to obtain different viewpoints, which will help you see if you really uncovered the problem on the site.Interpreting A/B test results
When interpreting the results of your A/B test, there is a validity checklist you should tick to avoid false positives or statistical errors. These factors include:- Sample Size
- Significance level
- Test duration
- Number of conversions
- Analyze external and internal factors
- Segmenting test results (the type of visitor, traffic, and device)
- Analyzing micro-conversion data
A/B Test Sample Size
Whether you are running the A/B test on a low or high-traffic site, your sample size should be big enough to ensure that the experiment reaches a significant level. The bigger the sample size, the lesser the margin of error. To calculate the sample size for your test, you will need to specify the significance level, power, and the desired relevant difference between the rates you would like to discover. If you think the formula is too complicated, there are online sample size calculators that are easy to use. If you do not calculate your test’s sample size, you risk stopping your test too early before it collects enough data. In this regard, Khalid wrote an article and had this to say about sample size:Any experiment that involves later statistical inference requires a sample size calculation done BEFORE such an experiment starts. A/B testing is no exception.Let’s say you have already started running the test and have the A/B test results. You can still check whether the sample size was enough to validate your results. If the test gets stopped before each variation reaches the stipulated number of visitors, the test will definitely be a false positive. Your test should reach the required sample size per variation for the results to be valid.
Statistical Significance in A/B Testing
Statistical significance level (or confidence, or significance of the results, or chance of beating the original) shows how significant your result is statistically. As a Digital Marketer, you’d want to be certain about the results, so the statistical significance indicates that the differences observed between a variation and control aren’t due to chance. The industry standard of statistical significance should be 95% (or 90% in some cases). This is the target number you should have in mind when running an A/B test. 95% statistical significance means that you are 95% confident that the results are accurate. It means that if you repeat the test over and over again, in 95% of cases, the results will match the initial test.A/B Test duration
You ran a test, and it appears to yield results; at what point can you decide to end it? Well, the answer actually depends on various factors, but a test doesn’t have to end too soon or run for a long time before you draw conclusions from the A/B test. I asked one of our CRO managers – Hatice Kaya – about the duration of an A/B test. She suggested that a test should run at least a full business cycle or seven days. But she also added that this depends on the product or service on sale because there are certain products and services that sell more during paydays and are generally low throughout the month. Every website has a business cycle –the time it typically takes for customers to make a purchase. Basically, this means that some websites have certain days when the number of conversions is relatively low throughout the weekend, but then it peaks on weekdays. The test results you run on Saturday and Sunday are bound to be different from those you get from running on Monday and Tuesday. To get valid test data, you should run your test throughout the business cycle so as to include all possible fluctuations. However, seven days is a minimal requirement. The real-time of the test depends on your site traffic. The lower the traffic, the longer you will have to run the test. You can use one of the A/B testing calculators available online to calculate the test duration. Look at the example below. The above image shows that you must run the test for 18 days if your site has 5000 average daily visitors and three variations are being tested.Number Of Conversions
It’s often said that the number of conversions a website gets a day depends on the amount of traffic that the site gets. High-traffic sites usually get more conversions and vice versa. Generally speaking, when you run a test on high-traffic sites, you do not have to worry about the number of conversions; you should just focus on reaching the required sample size for that traffic. But when it comes to low-traffic sites, to get more accurate results, you should keep in mind two factors:- Sample size per variation
- The number of conversions.
Analyze external and internal factors.
Several external and internal factors impact each and every website you see. These factors include:- Seasonality or holiday period: Some eCommerce sites’ traffic and sales are not stable all over the year – they tend to peak on Black Friday and Cyber Mondays. This could influence your test results.
- Marketing promotions and campaigns: if you run a marketing campaign on the same site that you are running an A/B test, your general test results are more likely to get affected.
Analyze Micro-Conversion Data
When analyzing A/B test results, everyone always seems to track the site’s macro conversion data –this can either be a sale, lead generated, or a subscription. However, analyzing micro-conversions offers another layer of insights. Just like micro-conversions, micro-conversions can differ from business to business. Micro-conversions depend on the website type – Saas, e-commerce, lead gen, etc. – and the page you are testing. Here is an example of micro-conversion goals you may need to analyze for an e-commerce site. Micro-conversion does not necessarily increase your conversion rate, but it will certainly help you persuade prospects down the conversion funnel. It’s not rocket science, the more visitors you persuade, the more purchase you get. In some cases, understanding the micro-conversions helps understand why a test performed the way it did.What to do when your A/B test doesn’t win
Not all your A/B tests will be winning tests. This is the truth and something you should be prepared for as a conversion specialist. Instead of throwing your losing test away and hoping you win with the next one, you can turn this into a learning opportunity. I had a chat with Anwar Aly, a conversion specialist here at Invesp, and he had this to say;Based on the rate of LOSS of the AB tests if it’s a normal WIN to Lose rate, businesses need to learn from lost tests with the mindset that the LOSS is part of the nature of AB testing and more valuable than wins in some cases when good learnings come out of the post-test analysis. If the LOSS rate is high or constant they need to take a step back and evaluate the overall testing approach, maybe start from scratch with a new audit and review, also qualitative data can be a great support in validating the test hypotheses and increase test confidence.In this section, I walk you through a checklist that helps you evaluate losing tests and what you can do differently.