Episode 15: A/B Testing: The (Not So) Holy Grail of Conversion Optimization


Hello! My name is Jörg Dennis Krüger and as my sausage cable drum winder at reception just said: Yes, I am the conversion hacker. And this conversion hacking podcast episode will deal with the topic of A/B testing.

Anyone who knows me, who has known me for a little longer, knows that A/B testing is one of my absolute basic topics. I started A/B testing in 2008, and in 2006 I even started an A/B testing topic back then for Omniture. It's now Adobe-Test and Adobe-Target, these are old products that we used and introduced at large companies like DKV, Allianz or similar. And that means, since A/B testing and my book, which was published in 2011, conversion boosting with website testing is called exactly the same, and not without reason.

My book on A/B testing

Conversion boosting with website testing, because the focus of the book is already very much on the topic of A/B testing. There I present the conversion boosting model, how to approach the topic of website optimization and testing and I show how to test, how to evaluate periods of time. la la la la But I have to say that I've learned a little more now, because optimization isn't really about testing. I mean, testing was made big by Barack Obama, because during his election campaign he collected a lot more donations through A/B testing. And from this fundraising campaign, today's A/B testing provider Optimizely emerged. That's basically what was built for the Obama campaign in the beginning. Of course, a lot has changed in the meantime, Optimizely somehow got 80, 90 million venture capital to further develop the tool and so on.

A/B testing tools: Optimizely, Google Website Optimize & Co.

As sophisticated as the software is now, the entry price is very high. Why I don't even recommend it that often - but a cool tool. So everyone wanted to do A/B testing, which worked for Obama, I guess it worked for me and so on. The big problem is, most shops or websites, but for me it's mostly about shops, just can't be tested. Why? You don't have enough traffic, because such a test is just a normal double-blind study as we know it from medicine or from general science.

Statistically significant results

And so that I have enough results in such a study and have statistically significant results, I just need enough data and this data is of course always visitors on the side on the other side conversion, those are the two main factors that play with it. And if I don't have enough visitors on the site or the conversion rate is just too low at the moment and mostly both. Then I don't come up with any statistically significant results, then I always have the problem that I somehow have data, but if I do some math, it's actually all random data. In the testing tools, this is then also displayed as confidence or significance.

And if it somehow doesn't get over 60, 70 percent - well, 50 percent is a coin toss - 60, 70 percent isn't much better and, if you think about it more closely, then you realize that you really have a lot of data needs to get really reliable results and also over a certain period of time because you have to test at least 7, probably even 14 days to have at least once actually at least twice each day of the week.

Optimal test period for A/B tests

But you shouldn't test for too long in order not to get too many external influences too much noise and that's why 2 to 6 weeks are the optimal test period. And yes, if you don't have enough conversions and you don't have enough visitors then it becomes difficult. What do you mean now enough. So the rule of thumb is: I need my minimum of one hundred conversions per test variant. But that's just a rule of thumb. If both variants have the same conversions, I'm only at a "fifty-fifty" probability of which variant applies.

This means that if I have two variants and I have 200 conversions, then they must be clearly different. So 150 to 50 conversions, for example. That would probably be a significant difference if we can say yes the 150 conversions variant is definitely better than the 50 converged variant. There are a bunch of calculators online for that though.

Calculate test duration

If you simply look at the calculator or calculator or something like that, you will find with all A/B testing offers whether this is now Optimizely, A/B-Tasty, probably somewhere at Adobe and at VWO and I don't know where Everywhere these test links calculated function everything a little bit differently might show a few slightly different results, because of course there are a few more mathematical variables that can be incorporated with it. But there is a relatively good feeling whether you can test the app at all or not.

Because nothing is worse than planning a test at great expense, installing a tool, implementing variants, starting tests and then realizing: I'm not getting any results. If you turn the thing off after three, four, six or eight weeks and realize "Oh shit, all that work was actually too much". So you didn't have to do it - it was free. And that you are in no case that is the worst case. Negative test results when you realize Oh this change that doesn't work at all is not the worst case at all. That's pretty cool because I've learned something and we want to work through the particular ones to learn. We want to get to know our visitors better.

Learning from A/B testing results

You don't always get immediate conversions and uplift, i.e. an increased conversion rate, but sometimes you just have a downlift, you notice wow that doesn't work at all. Some time ago I tested something in an online shop, where I was very, very sure that it would lead to more convergence, because we actually installed a banner at the top of the whole shop where the ratings were pointed out five stars at Trust .

That actually didn't lead to more sales, quite the opposite. We continued to have a significant download and a significant reduction in conversion. Why. I don't know it. Why is it extremely difficult for us to answer through testing. But we know we'd rather leave that alone and do something else. And that is also the reason why, when you build test variant testing, you have to proceed as pragmatically as possible, which means you have to build variants very quickly.

Wonderful editors

Don't program a variant for two, three, four, six, eight weeks and then maybe remember it after a week, often don't forget it, but rather launch a variant squarely practically well quickly. Most tools have wonderful editors or point and click editors. You can quickly build a variant with it super cool. Of course you have to be careful that it is still displayed correctly in all browsers and that something doesn't break somehow with point and clicky. But that's how you can usually do it in hours, often even in minutes,

build a very useful test variant of which you then, when you see if it works quite well, then maybe put more work into it, program it really finely and then implement the shop permanently or something similar because very often, strategies tend to say oh good idea, I can do that Yes, I can program now. We don't need to test it at the moment No, we want to test it. We want to know whether it works better and if so, how much better. So that we can simply make decisions and not simply implement something afterwards, we don't know what makes more conversions. I used to work for a large car rental company and it is very family-run and top-down.

Example: Sixt car rental

You could also say authoritarian, even if that - Alexander doesn't apply - Konstantin you a bit. Your dad, of course, he rules quite a bit from above. But he can do that too. In any case, they just launched a new website. It should have a bit of Google's style and stuff and you haven't tested anything yet, you just launch a new page. You didn't know what effect this had on the conversion rate. Then I came. I then spent a year on websites and then I just tested it in the other country, but launched it in the USA and tested a slightly different logic there. A bit like it was before.

Not a Google slot, but a little more classic, as you know it from travel booking machines. We had a huge uplift and meanwhile the German side has changed a lot again. That means you just learned from it. Because these "Hippo decisions", that is, the Highest Paid Persons Opinion, i.e. the opinion of whoever earns the most, does not work. Even though I know, Konstantin, you don't earn the most, you only pay a small salary - but there are definitely royalties.


This Hippo ("Highest Paid Persons' Oppionion") isn't always good, in fact it's exactly the opposite. Because the Hippo ("Highest Paid Person") often has no idea about his customers, who are usually relatively far away from day-to-day business. And then just a decision, even worse somehow then the woman's opinion or something. But I think it's better to implement it and that without testing is of course a drama.

That's why testing is a cool thing to do and with such large companies like this rental car provider you can of course test very well, but funnily enough, not in every country because there are countries that just don't really have enough traffic testing. You also notice that even in such large companies, the traffic is not necessarily so high that you could really do testing with it. You have to ask yourself again whether you just want to do alibi testing or want real results

So I first have to find out if I have enough traffic and enough to be able to test, then I have to do very pragmatic tests with which I can quickly generate results that I can monitor and implement and then I have to do my test for 14 days, three weeks, four weeks, a maximum of six weeks let it run and then I'm good at it. And if I haven't yet, if I find I don't have enough traffic for real testing, then come back to the topic of heuristics or best practice. I think the scientific term heuristics is somehow cooler because it somehow says more clearly what it's about. Because a heuristic is something that can make predictions about the future with limited knowledge.

Best practices and heuristics for fast results

So it's raining and this is my umbrella a limited knowledge is if I use this I'll probably stay dry. So heuristic umbrella in the rain makes you dry because I don't know all the factors. It could be that I'm feeling really windy now and can't use the rain at all, yes, good heuristics don't apply one hundred percent if I don't have all the data and stuff like that. But it hits the mark quite well and I have no idea about the online shop, it can just be that slider sucks. Unfortunately, in 98 percent of cases, convergent promotion does not occur. So, unfortunately, we've already gotten out of it, or just like that. Where do users get lost. You can analyze qualitatively and quantitatively and then determine where users simply break off.

Then you can have a look, look here, they all find the button, they all click, don't put anything in the shopping cart. I don't need to do a lot of testing at first, but I can first implement heuristics and best practices, and that's 80 percent of what I do with my customers. Okay, what are the right heuristics now to generate more conversions in terms of traffic right away, just like in the shop, as well as e-mail and stuff like that. So how can we clean that up first? So before I call an interior design, I'll call that first.

The Holy Grail of Conversion Optimization?

Command to the painter and not straight into the interior and rumbled, shack coming he will also say hey what am I supposed to do here and A/B testing is on the interior and until mostly you can get by clearing out and the painter and for many he is enough in blue and the painter also because the head of the boarding school is so expensive that he may not be able to play his added value. Nice metaphor at the end so still. I love that thing and that thing is super awesome, but it's not the Holy Grail of conversion optimization, it's not the Holy Grail for more sales in the shop, because it's a lot of work and because you have to have enough traffic at all. And in this respect it is worth tidying up the shop properly first and finally here is a tip for doing it yourself. Of course you can talk to me, but do it yourself.

The LIFT model for A/B testing

There is the lift model - developed by Widerfunnel. This is an agency from Canada. Greetings to Rachel. In any case, they have a very funny analogy: they compare a website to an airplane. And an airplane needs to be able to fly, first of all a few basics: for example wings. That's sort of the representation that the advertising promise on the side without wings we don't need to do anything else without wings we can pump in as much kerosene and have as long a runway as we want and what do I know. It will not work.

So first we need wings and then there are things in this model that make the plane take off. Then there are things like trust and clear structure and things that keep the plane on the ground are something like fear and distraction. And then there's something about turbo boosting in airplanes. That is urgency and only today that actually often brings a lot if it is meant honestly so after Lyft model Google if necessary Lyft model catch again and links down here too.

In the blog and podcast. Let's take a look and just implement it as a basic heuristic. Because I'm also happy when I talk to shops that are already fundamentally well done and I don't start with the absolute basics because I prefer to paint with a slightly smaller brush than with a thick roller or I prefer to carve through with a machete to go to the jungle. And yes, but in any case I wish more conversions. And I think you can do a lot of things right with the tips from this podcast. Achieve quite a lot Give me feedback in the podcast comments. But please also on iTunes and Spotify because I look forward to feedback and I look forward to five stars.

Through the conversion jungle with a machete