{"id":15374,"date":"2022-03-29T05:12:05","date_gmt":"2022-03-29T10:12:05","guid":{"rendered":"https:\/\/www.invespcro.com\/blog\/?p=15374"},"modified":"2022-03-29T05:12:05","modified_gmt":"2022-03-29T10:12:05","slug":"are-a-b-testing-results-permanent","status":"publish","type":"post","link":"https:\/\/www.invespcro.com\/blog\/are-a-b-testing-results-permanent\/","title":{"rendered":"Are A\/B Testing Results Permanent?\u00a0"},"content":{"rendered":"<span class=\"span-reading-time rt-reading-time\" style=\"display: block;\"><span class=\"rt-label rt-prefix\">Reading Time: <\/span> <span class=\"rt-time\"> 6<\/span> <span class=\"rt-label rt-postfix\">minutes<\/span><\/span><p><span data-preserver-spaces=\"true\">Finally! One of the A\/B tests you&#8217;ve been running shows very positive and significant results.\u00a0<\/span><\/p>\n<p><span data-preserver-spaces=\"true\">You&#8217;re elated, and you can&#8217;t wait to roll out the new design change.\u00a0<\/span><\/p>\n<p><span data-preserver-spaces=\"true\">Before you do that, allow me to ask you this:\u00a0<\/span><\/p>\n<p><span data-preserver-spaces=\"true\">Do you think that lift will last forever, or it will begin to lose its effect over time?\u00a0<\/span><\/p>\n<p><span data-preserver-spaces=\"true\">Assuming that it fades after a while, what could be the reason?\u00a0<\/span><!--more--><\/p>\n<p><span data-preserver-spaces=\"true\">Well, Take one minute to think about it. Then read on.\u00a0<\/span><\/p>\n<p><span data-preserver-spaces=\"true\">\u2026..<\/span><\/p>\n<p><span data-preserver-spaces=\"true\">Yes, those are tricky questions. But we will answer them all in this article.\u00a0<\/span><\/p>\n<p><span data-preserver-spaces=\"true\">But before we do that, let&#8217;s put everything into context by looking at how A\/B tests work. Shall we?\u00a0<\/span><\/p>\n<h2><span data-preserver-spaces=\"true\">How an A\/B Test works<\/span><\/h2>\n<p><span data-preserver-spaces=\"true\">Before we start answering the main question in this article, it&#8217;s essential to understand how an A\/B test works in the first place.\u00a0<\/span><\/p>\n<p><span data-preserver-spaces=\"true\">If you&#8217;re reading this article, I assume that you already know how an A\/B test works, right? But for some folks who don&#8217;t know, allow me to give a brief explanation.\u00a0<\/span><\/p>\n<p><span data-preserver-spaces=\"true\">But if you already know how an A\/B test works, go ahead and skip this section. It&#8217;s all good.\u00a0<\/span><\/p>\n<p><span data-preserver-spaces=\"true\">In an <a href=\"https:\/\/www.invespcro.com\/ab-testing\/\">A\/B test<\/a> (or split test), you take a website page and come up with a different design\/version of the same page.\u00a0<\/span><\/p>\n<p><span data-preserver-spaces=\"true\">The differences between the original webpage and the new version can be in copy, <a href=\"https:\/\/www.invespcro.com\/blog\/your-complete-guide-to-call-to-action-button-plus-a-bonus-with-free-200-effective-cta-buttons\/\">CTA position<\/a>, <a href=\"https:\/\/www.invespcro.com\/blog\/value-proposition-what-is-it-how-it-works-and-why-you-should-pay-attention\/\">value proposition<\/a>, page layout, copy length, or other elements. Half of the site traffic is then shown the original version of the page, and the other half is shown the new version of the page.\u00a0<\/span><\/p>\n<p><span data-preserver-spaces=\"true\">The original design of the webpage is known as the control. The new design or version of the same web page is a variation or challenger.\u00a0<\/span><\/p>\n<p><span data-preserver-spaces=\"true\">The engagement on each page (original and variation) is measured and collected by the testing tool, and it is analyzed using a statical engine within the tool. Looking at the collected data, you can easily see which page had a positive, negative, or neutral effect on the visitor&#8217;s behavior.\u00a0<\/span><\/p>\n<h2><span data-preserver-spaces=\"true\">Do A\/B testing results fade over time?\u00a0<\/span><\/h2>\n<p><span data-preserver-spaces=\"true\">Okay, you ran a test and had a solid winner. The <a href=\"https:\/\/www.invespcro.com\/blog\/calculating-sample-size-for-an-ab-test\/\">sample size<\/a> was large enough (more than 1,000 conversions), the statistical significance was above 95%, you didn&#8217;t stop the test too early, and the revenue increased by $200k.\u00a0<\/span><\/p>\n<p><span data-preserver-spaces=\"true\">There&#8217;s no doubt that you will implement the winning variation on your site.\u00a0<\/span><\/p>\n<p><span data-preserver-spaces=\"true\">Now, my question is:\u00a0<\/span><\/p>\n<p><span data-preserver-spaces=\"true\">Will this result still be there in the next six months? Or will it fade over time?\u00a0<\/span><\/p>\n<p><span data-preserver-spaces=\"true\">Suppose your client asks you this question. What will be your response?\u00a0<\/span><\/p>\n<p><span data-preserver-spaces=\"true\">I asked this same question to Invesp&#8217;s <a href=\"https:\/\/www.linkedin.com\/in\/khalidh\/\">Khalid Saleh<\/a>, Conversion Advocates&#8217; <a href=\"https:\/\/www.linkedin.com\/in\/jeremyepperson\/\">Jeremy Epperson<\/a>, and TrsDigital&#8217;s <a href=\"https:\/\/www.linkedin.com\/in\/timstewart\/\">Tim Stewart<\/a>.\u00a0<\/span><\/p>\n<p>Let&#8217;s start with Tim. He believes that the test results are not permanent:<\/p>\n<blockquote><p><span data-preserver-spaces=\"true\">The day after you finish the test, it&#8217;s no longer the same sample period. So, by definition, the conditions in which you ran a test before have changed.\u00a0<\/span><\/p>\n<p><span data-preserver-spaces=\"true\">The way A\/B testing works or the logic behind it is that you are taking a sample, and at that period, you&#8217;re assuming that all the conditions you set for the test are correct. And you&#8217;re also saying the relationship between what happened in the past (during the test) and what the pattern is to be like in the future is the same.\u00a0<\/span><\/p>\n<p><span data-preserver-spaces=\"true\">So, if your testing suggests that this makes X amount of difference, you&#8217;re saying that the difference will apply again in the future after you have stopped the test. A few weeks after implementing the test, you can easily see the same results.\u00a0<\/span><\/p>\n<p><span data-preserver-spaces=\"true\">But over time, those results will start going down.\u00a0<\/span><\/p>\n<p><span data-preserver-spaces=\"true\">So when people say the test result has faded, they mean the relationship between what they tested and what was measured now is less strong, or they don&#8217;t see as much reaction. People might say that the results fade over time, but I think it gets more challenging to detect the effect over time.<\/span><\/p><\/blockquote>\n<p>On the other hand, Tim says that sometimes you can&#8217;t know for sure whether the A\/B testing result has faded or it&#8217;s still outperforming what the default would have done:<\/p>\n<blockquote><p><span data-preserver-spaces=\"true\">It could be that it hasn&#8217;t faded, but that uplift remains, say 10%, and we implemented that 10% six months we&#8217;re not seeing the conversions that are still 10% higher. We&#8217;ve lost the effect.\u00a0<\/span><\/p>\n<p><span data-preserver-spaces=\"true\">Unless you do a test where you go back to your old version to test it and see what it does, you don&#8217;t know for sure. It could be you&#8217;re still 10% higher than what you&#8217;d have been with the old version. So, It hasn&#8217;t faded. It&#8217;s just that the market dropped by that much.<\/span><\/p><\/blockquote>\n<p>Khalid says that everything regresses to the mean over time. But he also has a story that best illustrates this question:<\/p>\n<blockquote><p><span data-preserver-spaces=\"true\">Last year we ran 78 experiments for one of our e-commerce clients. And we had to do a revenue analysis for them.\u00a0<\/span><\/p>\n<p><span data-preserver-spaces=\"true\">Out of 78 tests, seven of them were launched on the homepage. Before we started testing anything, their conversion rate was at 1.1%. When we launched the first test, there was a solid winner that took the conversion rate to 1.3%.\u00a0<\/span><\/p>\n<p><span data-preserver-spaces=\"true\">As we launched other tests on the homepage, the conversion rate increased to 2.5%. But when we were doing the revenue analysis, we noticed that there was still a lift, but it had dropped to 1.9%.\u00a0<\/span><\/p>\n<p><span data-preserver-spaces=\"true\">So, a 10% lift in the first quarter will not be the same 10% lift in the fourth quarter, and it will fade gradually. But even though the results will slowly fade over time, they won&#8217;t go back to the same level they started on. So when doing a revenue account, always take that into account.\u00a0<\/span><\/p><\/blockquote>\n<p>Even though the results are not permanent, Khalid adds, this doesn&#8217;t mean that they will fade at the same rate across the board:<\/p>\n<blockquote><p><span data-preserver-spaces=\"true\">What&#8217;s fascinating to me is that sometimes it differs on device type. On the same site, but on the desktop, the category page started with a 1.5% conversion rate. That means that 1.5% of people who came on the category page placed an order. But looking at it right now, we&#8217;re at 5% conversion rates\u2014a big difference.\u00a0<\/span><\/p><\/blockquote>\n<p>Jeremy says that all this stems from a misconception that people have with A\/B testing:<\/p>\n<blockquote><p><span data-preserver-spaces=\"true\">A common misconception for people is that when they see a lift on a testing platform, let&#8217;s say a 7% lift, that&#8217;s what they are going to get when they implement that winning variation. But that&#8217;s more complicated than that.\u00a0<\/span><\/p>\n<p><span data-preserver-spaces=\"true\">You can get closer and closer to that, but the maths is not going to work perfectly. And I think many people get stuck on that, especially people who are not familiar with statistics or quantitative analysis where they expect it to be a perfect input and output \u2013 but that&#8217;s not just the reality of the situation.\u00a0<\/span><\/p>\n<p><span data-preserver-spaces=\"true\">What we have done essentially is to prove that the variation we have tested has a probability of outperforming the original. And when we do our revenue attribution, we put very strong caveats around what we&#8217;re seeing and how we&#8217;re going to interpret the results in the future.\u00a0<\/span><\/p><\/blockquote>\n<h2><span data-preserver-spaces=\"true\">What causes the results to fade?\u00a0<\/span><\/h2>\n<p><span data-preserver-spaces=\"true\">Even though A\/B testing is a scientific methodology, it doesn&#8217;t work in a vacuum. Too many variables and external factors can cause website data to fluctuate. If you want to see what I&#8217;m talking about, try running a conversion per Day of Week report in your analytics tool and see non-stationary your website data is.\u00a0<\/span><\/p>\n<p><span data-preserver-spaces=\"true\">Let&#8217;s look at three factors that can cause your A\/B test results &#8220;to fade&#8221; over time.\u00a0<\/span><\/p>\n<h3><span data-preserver-spaces=\"true\">Seasonal effects<\/span><\/h3>\n<p><span data-preserver-spaces=\"true\">Time-related factors or seasonal effects like the holiday season can be why A\/B testing results can fade over time.\u00a0<\/span><\/p>\n<p><span data-preserver-spaces=\"true\">According to Tim:\u00a0<\/span><\/p>\n<blockquote><p>Sometimes the fading of A\/B testing results can be down to seasonal changes. The behavior of people can change based on seasons.<\/p><\/blockquote>\n<p><span data-preserver-spaces=\"true\">Jeremy concurs with Tim:\u00a0<\/span><\/p>\n<blockquote><p>Most businesses have some form of seasonality if you dig into their numbers. Some are more extreme than others. If you&#8217;re testing in Q4 for an eCommerce site, you get a 10% lift in purchases. That may only apply to a subset of shoppers buying the product. That winning test might not be a winner if you run it mid-year around June.<\/p><\/blockquote>\n<p><span data-preserver-spaces=\"true\">So if you launched tests that produced winning variations during a specific season (like a holiday), repeat the same tests in a different season. Alternatively, you can extend the duration of the experiment to capture a broader user group.\u00a0<\/span><\/p>\n<h3><span data-preserver-spaces=\"true\">User related factors<\/span><\/h3>\n<p><span data-preserver-spaces=\"true\">Sometimes any difference in fading could just be down to the difference between the profile of the audience changing during and after the test. In other words, users bucketed in an experiment may not be representative of your entire user base \u2013 this is to say that a specific change might have a positive impact on this segment but not on the rest of the population.<\/span><\/p>\n<h3><span data-preserver-spaces=\"true\">The novelty effect\u00a0<\/span><\/h3>\n<p><span data-preserver-spaces=\"true\">Sometimes your test results will drop not because you have made some wrong changes on your site but because of the novelty effect.\u00a0<\/span><\/p>\n<p><span data-preserver-spaces=\"true\">Let&#8217;s say you tested an original against one variation. And the variation outperforms the original. Did the variation win because it is better than the control? Or is it because your visitors are drawn to the novelty of the change?\u00a0<\/span><\/p>\n<p><span data-preserver-spaces=\"true\">Returning visitors are used to the control, so when they see a fresh design they are not accustomed to, their behavior is more likely to change. But after a while, that fresh design will no longer be new to them, and this can cause the result to lose its effectiveness.\u00a0<\/span><\/p>\n<p><span data-preserver-spaces=\"true\">You can figure this out by segmenting your traffic and only including new users in the experiment.\u00a0<\/span><\/p>\n<h2>Conclusion<\/h2>\n<p>So are A\/B testing results permanent or do they fade over time? Well, I will conclude with this statement by Tim:<\/p>\n<blockquote><p><span style=\"font-weight: 400;\">The real-world changes. What we tested may not apply to the real world as is now, and even though the old test results are valid \u2013 because at the time when we snap shoted it for the sample we snap shoted it was valid. And then what we have to think about is how close to that sample we took 2 months ago when we tested, one month ago when it was implemented, is it still the current sample now? And the answer is quite often, we don\u2019t know! <\/span><\/p><\/blockquote>\n<p>&nbsp;<\/p>\n","protected":false},"excerpt":{"rendered":"<p><span class=\"span-reading-time rt-reading-time\" style=\"display: block;\"><span class=\"rt-label rt-prefix\">Reading Time: <\/span> <span class=\"rt-time\"> 6<\/span> <span class=\"rt-label rt-postfix\">minutes<\/span><\/span>Finally! One of the A\/B tests you&#8217;ve been running shows very positive and significant results.\u00a0 You&#8217;re elated, and you can&#8217;t wait to roll out the new design change.\u00a0 Before you do that, allow me to ask you this:\u00a0 Do you think that lift will last forever, or it will begin to lose its effect over [&hellip;]<\/p>\n","protected":false},"author":54,"featured_media":15375,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[36],"tags":[],"class_list":["post-15374","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-cro"],"_links":{"self":[{"href":"https:\/\/www.invespcro.com\/blog\/wp-json\/wp\/v2\/posts\/15374","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.invespcro.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.invespcro.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.invespcro.com\/blog\/wp-json\/wp\/v2\/users\/54"}],"replies":[{"embeddable":true,"href":"https:\/\/www.invespcro.com\/blog\/wp-json\/wp\/v2\/comments?post=15374"}],"version-history":[{"count":0,"href":"https:\/\/www.invespcro.com\/blog\/wp-json\/wp\/v2\/posts\/15374\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.invespcro.com\/blog\/wp-json\/wp\/v2\/media\/15375"}],"wp:attachment":[{"href":"https:\/\/www.invespcro.com\/blog\/wp-json\/wp\/v2\/media?parent=15374"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.invespcro.com\/blog\/wp-json\/wp\/v2\/categories?post=15374"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.invespcro.com\/blog\/wp-json\/wp\/v2\/tags?post=15374"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}