Finally! One of the A/B tests you’ve been running shows very positive and significant results.
You’re elated, and you can’t wait to roll out the new design change.
Before you do that, allow me to ask you this:
Do you think that lift will last forever, or it will begin to lose its effect over time?
Assuming that it fades after a while, what could be the reason?
Well, Take one minute to think about it. Then read on.
Yes, those are tricky questions. But we will answer them all in this article.
But before we do that, let’s put everything into context by looking at how A/B tests work. Shall we?
How an A/B Test works
Before we start answering the main question in this article, it’s essential to understand how an A/B test works in the first place.
If you’re reading this article, I assume that you already know how an A/B test works, right? But for some folks who don’t know, allow me to give a brief explanation.
But if you already know how an A/B test works, go ahead and skip this section. It’s all good.
In an A/B test (or split test), you take a website page and come up with a different design/version of the same page.
The differences between the original webpage and the new version can be in copy, CTA position, value proposition, page layout, copy length, or other elements. Half of the site traffic is then shown the original version of the page, and the other half is shown the new version of the page.
The original design of the webpage is known as the control. The new design or version of the same web page is a variation or challenger.
The engagement on each page (original and variation) is measured and collected by the testing tool, and it is analyzed using a statical engine within the tool. Looking at the collected data, you can easily see which page had a positive, negative, or neutral effect on the visitor’s behavior.
Do A/B testing results fade over time?
Okay, you ran a test and had a solid winner. The sample size was large enough (more than 1,000 conversions), the statistical significance was above 95%, you didn’t stop the test too early, and the revenue increased by $200k.
There’s no doubt that you will implement the winning variation on your site.
Now, my question is:
Will this result still be there in the next six months? Or will it fade over time?
Suppose your client asks you this question. What will be your response?
Let’s start with Tim. He believes that the test results are not permanent:
The day after you finish the test, it’s no longer the same sample period. So, by definition, the conditions in which you ran a test before have changed.
The way A/B testing works or the logic behind it is that you are taking a sample, and at that period, you’re assuming that all the conditions you set for the test are correct. And you’re also saying the relationship between what happened in the past (during the test) and what the pattern is to be like in the future is the same.
So, if your testing suggests that this makes X amount of difference, you’re saying that the difference will apply again in the future after you have stopped the test. A few weeks after implementing the test, you can easily see the same results.
But over time, those results will start going down.
So when people say the test result has faded, they mean the relationship between what they tested and what was measured now is less strong, or they don’t see as much reaction. People might say that the results fade over time, but I think it gets more challenging to detect the effect over time.
On the other hand, Tim says that sometimes you can’t know for sure whether the A/B testing result has faded or it’s still outperforming what the default would have done:
It could be that it hasn’t faded, but that uplift remains, say 10%, and we implemented that 10% six months we’re not seeing the conversions that are still 10% higher. We’ve lost the effect.
Unless you do a test where you go back to your old version to test it and see what it does, you don’t know for sure. It could be you’re still 10% higher than what you’d have been with the old version. So, It hasn’t faded. It’s just that the market dropped by that much.
Khalid says that everything regresses to the mean over time. But he also has a story that best illustrates this question:
Last year we ran 78 experiments for one of our e-commerce clients. And we had to do a revenue analysis for them.
Out of 78 tests, seven of them were launched on the homepage. Before we started testing anything, their conversion rate was at 1.1%. When we launched the first test, there was a solid winner that took the conversion rate to 1.3%.
As we launched other tests on the homepage, the conversion rate increased to 2.5%. But when we were doing the revenue analysis, we noticed that there was still a lift, but it had dropped to 1.9%.
So, a 10% lift in the first quarter will not be the same 10% lift in the fourth quarter, and it will fade gradually. But even though the results will slowly fade over time, they won’t go back to the same level they started on. So when doing a revenue account, always take that into account.
Even though the results are not permanent, Khalid adds, this doesn’t mean that they will fade at the same rate across the board:
What’s fascinating to me is that sometimes it differs on device type. On the same site, but on the desktop, the category page started with a 1.5% conversion rate. That means that 1.5% of people who came on the category page placed an order. But looking at it right now, we’re at 5% conversion rates—a big difference.
Jeremy says that all this stems from a misconception that people have with A/B testing:
A common misconception for people is that when they see a lift on a testing platform, let’s say a 7% lift, that’s what they are going to get when they implement that winning variation. But that’s more complicated than that.
You can get closer and closer to that, but the maths is not going to work perfectly. And I think many people get stuck on that, especially people who are not familiar with statistics or quantitative analysis where they expect it to be a perfect input and output – but that’s not just the reality of the situation.
What we have done essentially is to prove that the variation we have tested has a probability of outperforming the original. And when we do our revenue attribution, we put very strong caveats around what we’re seeing and how we’re going to interpret the results in the future.
What causes the results to fade?
Even though A/B testing is a scientific methodology, it doesn’t work in a vacuum. Too many variables and external factors can cause website data to fluctuate. If you want to see what I’m talking about, try running a conversion per Day of Week report in your analytics tool and see non-stationary your website data is.
Let’s look at three factors that can cause your A/B test results “to fade” over time.
Time-related factors or seasonal effects like the holiday season can be why A/B testing results can fade over time.
According to Tim:
Sometimes the fading of A/B testing results can be down to seasonal changes. The behavior of people can change based on seasons.
Jeremy concurs with Tim:
Most businesses have some form of seasonality if you dig into their numbers. Some are more extreme than others. If you’re testing in Q4 for an eCommerce site, you get a 10% lift in purchases. That may only apply to a subset of shoppers buying the product. That winning test might not be a winner if you run it mid-year around June.
So if you launched tests that produced winning variations during a specific season (like a holiday), repeat the same tests in a different season. Alternatively, you can extend the duration of the experiment to capture a broader user group.
User related factors
Sometimes any difference in fading could just be down to the difference between the profile of the audience changing during and after the test. In other words, users bucketed in an experiment may not be representative of your entire user base – this is to say that a specific change might have a positive impact on this segment but not on the rest of the population.
The novelty effect
Sometimes your test results will drop not because you have made some wrong changes on your site but because of the novelty effect.
Let’s say you tested an original against one variation. And the variation outperforms the original. Did the variation win because it is better than the control? Or is it because your visitors are drawn to the novelty of the change?
Returning visitors are used to the control, so when they see a fresh design they are not accustomed to, their behavior is more likely to change. But after a while, that fresh design will no longer be new to them, and this can cause the result to lose its effectiveness.
You can figure this out by segmenting your traffic and only including new users in the experiment.
So are A/B testing results permanent or do they fade over time? Well, I will conclude with this statement by Tim:
The real-world changes. What we tested may not apply to the real world as is now, and even though the old test results are valid – because at the time when we snap shoted it for the sample we snap shoted it was valid. And then what we have to think about is how close to that sample we took 2 months ago when we tested, one month ago when it was implemented, is it still the current sample now? And the answer is quite often, we don’t know!