Episode 15: A/B Testing: The (Not So) Holy Grail of Conversion Optimization


Hello! My name is Jörg Dennis Krüger and as my sausage cable reel winder at the reception just said: Yes I am the Conversion Hacker. And this Conversion Hacking Podcast episode is all about A/B testing.

Anyone who knows me, who has known me for a little longer, knows that A/B testing is one of my absolute basic topics. I started with the topic in 2008, in 2006 I even started an A/B testing topic back then for Omniture. In the meantime, it's Adobe-Test and Adobe-Target, which are old products that we used and introduced back then for large companies like DKV, Allianz or similar. And that is, since the A/B testing and my book which was published in 2011 is Conversion Boosting with Website Testing is also not without reason called exactly that.

My book on A/B testing

Conversion Boosting with Website Testing, because the focus of the book is already very much on the topic of A/B testing. I introduce the conversion boosting model, how to approach the topic of website optimization and testing and I show how to test, how to evaluate time periods. La la la. But I have to say that I have learned a little bit more in the meantime, because nothing in optimization is about testing. I mean testing has become big because of Barack Obama, because in his election campaign he collected a lot more donations through A/B testing. And from this fundraising campaign, the current A/B testing provider Optimizely was born. This is basically what was built for the Obama campaign in the beginning. Of course, a lot has changed in the meantime, Optimizely has somehow received 80, 90 million in venture capital to further develop the tool and so on.

A/B testing tools: Optimizely, Google Website Optimize & Co.

As sophisticated as the software is now - the entry price is now very high. Why I do not even recommend so often - but cool tool. So everyone wanted to do A/B testing, what worked for Obama, that probably works for me and so on. The big problem is, most stores or websites, but for me it's mostly about stores, are just not testable. Why? You don't have enough traffic, because such a test is simply a normal double-blind study as we know it from medicine or from general science.

Statistically significant results

And so that I have enough results in such a study and statistically significant results, I need enough data and this data is of course always visitors on the page on the other side conversion, these are the two main factors that play with it. And if I have too few visitors on the page or a currently simply too low conversion rate and usually both. Then I don't get statistically significant results, then I always have the problem that I somehow have data, but if I calculate it a little bit, it's actually all random data. In the testing tools, something like that is then also displayed as confidence or significance.

And if that then somehow doesn't get above 60, 70 percent - well so 50 percent is a coin toss - 60, 70 percent is not very much better and, if you then think about it more carefully, then you realize now that, you really need a lot of data to get really reliable results and also over a certain period of time because you have to test at least 7, probably even 14 days to have every day of the week at least once actually at least twice.

Optimal test period for A/B tests

But you shouldn't test for too long in order not to get too many external influences too much noise and that's why 2 to 6 weeks are the optimal test period. And yes, if you don't have enough conversions and you don't have enough visitors then it becomes difficult. What do you mean now enough. So the rule of thumb is: I need my minimum of one hundred conversions per test variant. But that's just a rule of thumb. If both variants have the same conversions, I'm only at a "fifty-fifty" probability of which variant applies.

This means that if I have two variants and I want 200 conversions, then they must differ significantly. So 150 to 50 conversions, for example. That would probably be a significant difference, whether we can say yes the 150 conversions variant is definitely better than the 50 convergent variant. But there are a lot of calculative online.

Calculate test duration

If you just look for work you calculator or calculative or something then you find with all A/B testing offer, whether that's Optimizely, A/B Tasty, certainly also somewhere at Adobe and at VWO and what do I know where, you find everywhere these test links calculated function everything so a little bit differently show maybe a few slight different results because there are of course a few more, mathematical variables that you can incorporate with that. But it gives a relatively good feeling whether you can test app at all or not.

Because nothing is worse than planning a test at great expense, installing a tool, implementing variants, starting tests and then realizing: I'm not getting any results. If you turn the thing off after three, four, six or eight weeks and realize "Oh shit, all that work was actually too much". So you didn't have to do it - it was free. And that you are in no case that is the worst case. Negative test results when you realize Oh this change that doesn't work at all is not the worst case at all. That's pretty cool because I've learned something and we want to work through the particular ones to learn. We want to get to know our visitors better.

Learning from A/B testing results

You don't always get an immediate conversion uplift, i.e. an increased conversion rate, but sometimes you just get a downlift and you realize, "Wow, that's not working at all. Some time ago, I tested something in an online store where I was very, very sure that this would lead to more convergence, namely we actually installed a banner at the top of the entire store, where we pointed out the five-star Trust ratings.

This has actually not led to more sales quite the opposite We had a significant downloads a significant decrease in conversion further through. Why. I don't know. Why is extremely difficult to answer through testing. But we know that let's then rather do something else. And that's also the reason why when you build test variants testing you have to be as pragmatic as possible that means you have to build variants very very quickly.

Wonderful editors

Do not program two three four six eight weeks a variant to then perhaps still after a week already to notice often not at all forget but rather square practically well fast a variant launchen. Most tools have wonderful editors or point and click editors. There you can build super cool times quickly a variant with. Of course, you have to make sure that it is displayed correctly in all browsers and not somehow broken during point and click. But so you can mostly in hours often even in minutes,

build a quite usable test variant from which one then if one sees then whether that works quite well then perhaps times more work reinstecken that really fine programmed and then store permanently implements or the like because quite often, strategies tend to say Oh good idea that I can yes I can equal programming. We do not need to test it at the moment No we want to test it. We want to know if it works better and if so how much better. So we can make decisions and not just implement something afterwards we do not know what makes more conversions. I used to work for a big car rental company and it's very family-like, run from the top down.

Example: Sixt car rental

You could also say authoritarian, even if that - Alexander doesn't meet that - Konstantin you a bit. Of course your dad, he rules from the top down. But he's allowed to do that. In any case, they simply launched a new website. It was supposed to have a bit of Google's style and so on, and they hadn't tested anything yet - they just launched a new site. They didn't know what influence it would have on the conversion rate. Then I came. I then namely for a year on websites and then I just tested times in the other country but in the U.S. launched and have there quasi tested again a slightly different logic. A bit like it was before.

Not a Google slot, but a little more classic, as you know it from travel booking machines. We had a huge uplift and meanwhile the German side has changed a lot again. That means you just learned from it. Because these "Hippo decisions", that is, the Highest Paid Persons Opinion, i.e. the opinion of whoever earns the most, does not work. Even though I know, Konstantin, you don't earn the most, you only pay a small salary - but there are definitely royalties.


This Hippo ("Highest Paid Persons' Oppionion") isn't always good, in fact it's exactly the opposite. Because the Hippo ("Highest Paid Person") often has no idea about his customers, who are usually relatively far away from day-to-day business. And then just a decision, even worse somehow then the woman's opinion or something. But I think it's better to implement it and that without testing is of course a drama.

That's why testing is a cool thing to do, and with large companies like this rental car provider, you can of course test very well, but funnily enough, not in every country, because there are also countries that simply don't have enough traffic for testing. You can see that even in such large companies, the traffic is not necessarily so high that you could really do testing with it. You have to ask yourself again whether you only want to do token testing or whether you want real results.ö

So first I have to find out if I have enough traffic and have become enough to be able to test, then I have to make very pragmatic tests with which I can quickly generate results I can implement monitoring and then I have to run my test for 14 days three weeks four weeks maximum six weeks and then I'm good to go. And if I'm not yet when I realize I don't have enough traffic for real testing then I come back to the topic of heuristics or best practice. I think the scientific term heuristics is kind of cooler because it also kind of says more clearly what it's about. Because a heuristic is something that can make predictions about the future with limited knowledge.

Best practices and heuristics for fast results

So it's raining and this is my umbrella a limited knowledge is if I use that I probably stay dry. So heuristic umbrella in the rain makes dry there I do not know all the factors. It could be that I now quite hard windet and not the rain already not use can yes good heuristic hits just not to hundred percent if I do not have all data and so at all. But it hits quite well and no idea of the online store it can be just slider sucks. Unfortunately, in 98 percent of the cases is not convergent promote. So we have unfortunately already times out or just like that. Where do users get lost. You can analyze qualitatively and quantitatively and then determine where users simply break off.

Then you can look here, they all find the button, they don't click on it, they don't put anything in the shopping cart. I don't need to do much testing, but I can implement heuristics best practices and that's what I do for 80 percent of my customers. Okay, what are the right heuristics to generate more conversions right away, traffic-wise, just like in the store, just like e-mail and so on. So how can we clean up there first. So before I call an interior design I also call first somehow that.

The Holy Grail of Conversion Optimization?

Command to the painter and not immediately in interior decoration and rumpelte, Bruchbude comes he will also say Hey what am I supposed to here and A/B testing is at the interior decoration and, until mostly you reach but with the decluttering and the painter and for many he is enough in blue and the painter also already because the boarding school director is so expensive that he may not play his added value pure. Beautiful metaphor to the end thus still. I love the thing and the thing is a super cool thing but it is not the Holy Grail of conversion optimization not the Holy Grail for more sales in the store because it is effort and because you just have to have enough traffic in the first place. And insofar it is worthwhile to clean up the store properly and finally here is a tip to do it yourself. Of course you can talk to me but to do it yourself.

The LIFT model for A/B testing

There is the Lift model - developed by Widerfunnel. This is an agency from Canada. Many greetings to Rachel. In any case, they have a very funny analogy: they compare a website with an airplane. And an airplane needs a few basics to be able to fly: for example wings. That's kind of how they present the advertising promise on the site: without wings, we don't need to do anything else. Without wings, we can pump in as much kerosene as we want and have as long a runway as we want and whatever else. It will not work.

So first of all we need wings and then there are things in this model that make the plane take off. These are things like trust and clear structure and things that keep the plane on the ground are things like fear and distraction. And then there is something gives to turbo boost in the aircraft. That is urgency and only today that actually often brings a whole lot if it is honestly meant therefore times after Lyft model Google if necessary Lyft model again catch and links also down here.

In the blog and podcast. Mal angucken and that simply times so as a basic heuristic already implement. Because I'm also happy when I talk to stores that are simply already fundamentally well done and I then do not start with the absolute basics because I, paint rather with the slightly smaller brush than with the thick roller or I carve rather with the machete to go through the jungle. And yes but in any case more conversions I wish. And I think with the tips from this podcast you can do a whole lot right. Achieve a whole lot Give me feedback in the comments podcast. But please also on iTunes and Spotify because I look forward to feedback and I look forward to five stars.

Through the conversion jungle with a machete


  • Hello,
    the text variant of this post appears to have been machine transcribed. Unfortunately, it is very tiring to read this. The contribution is nevertheless very valuable. Maybe it makes sense to proofread and edit accordingly...
    LG + Thank you

Write a comment