“Into each life a little rain must fall” opined Longfellow long ago.
Similarly, into each landing page tester’s life a little statistics must fall.
I see many landing page test results that are basically based on ignorance and wishful thinking.
It is critical that we can properly answer the following questions in our landing page tests:
· Have I found something better?
· How much better is it?
There is no way to avoid a little math when discussing these subjects, but I hope that the following overview does is accessible and will be put to good use.
Have I found something better?
Landing page optimization is based on statistics, and statistics is based in turn on probability theory. And probability theory is concerned with the study of random events.
But a lot of people might object that the behavior of your landing page visitors is not “random.” Your visitors are not as simple as the roll of a die. They visit your landing page for a reason, and act (or fail to act) based on their own internal motivations.
So what does probability mean in this context?
Let’s conduct a little thought experiment. Imagine that I am about to flip a fair coin. It has the potential to be in one of two states (heads or tails). What would you estimate the probability of it coming up heads to be? Fifty percent, right? So would I.
Now imagine that I have flipped the coin and covered up the result after catching it in my hand. The process of flipping is now complete, and the coin has taken on one particular state. Now what would you estimate the probability of it coming up heads to be? Fifty percent again, right? I would agree because neither of us knows any more than before the coin was flipped.
Now imagine if I peeked at the coin without letting you see it. What would you estimate the probability of it coming up heads to be? Still 50%, right? How about me? I would no longer agree with you. Having seen the outcome of the flip event I would declare that the probability of coming up heads is either zero or 100% (depending on what I have seen).
How can we experience the same event and come to two different conclusions?
Who is correct? The answer is—both of us. We are basing our answers on different available information. Not having seen the outcome of the flip, you must assume that the coin can still come up heads. In effect, for you the coin has not been flipped, but rather remains in a state of pre-flipped potential. I on the other hand know more, so my answer is different. So probability can be viewed as simply taking the best guess given the available information. The more information you have, the more accurate your guess will become.
Let’s look at this in the context of the simplest type of landing page optimization.
Let’s assume that you have a constant flow of visitors to your landing page from a steady and unchanging traffic source. You decide to test two versions of your page design, and split your traffic evenly and randomly between them. In statistical terminology, you have two stochastic processes (experiences with your landing pages), with their own random variables (visitors drawn from the same population), and their own measurable binary events (either visitors convert or they do not). The true probability of conversion for each page is not known, but must be between zero and one. This true probability of conversion is what we call the conversion rate and we assume that it is fixed.
From the law of large numbers you know as you sample a very large number of visitors, the measured conversion rate will approach the true probability of conversion. From the Central Limit Theorem you also know that the chances of the actual value falling within three standard deviations of your observed mean are very high (99.7%), and that the width of the normal distribution will continue to narrow (depending only on the amount of data that you have collected). Basically, measured conversion rates will wander within ever narrower ranges as they get closer and closer to their true respective conversion rates. By seeing the amount of overlap between the two bell curves representing the normal distributions of the conversion rate, you can determine the likelihood of one version of the page being better than the other.
One of the most common questions in inferential statistics is to see if two samples are really different or if they could have been drawn from the same underlying population as a result of random chance alone. You can compare the average performance between two groups by using a t-test computation. In landing page testing, this kind of analysis would allow you to compare the difference in conversion rate between two versions of your site design. Let’s suppose that your new version had a higher conversion rate than the original. The t-test would tell you if this difference was likely due to random chance or if the two were actually different.
How much better is it?
Many landing page testers are surprised when the results of their initial test do not hold up if the test is re-run again.
Internet marketing produces a detailed and quantifiable view of your online campaign activities. Most of numbers produced fall under the general category of descriptive statistics. Descriptive statistics produces summaries and graphs of your data that can be used for making decisions. The descriptive information has to do with the value of a particular quantity as well as its variability (how scattered it is). Unfortunately, most people focus only on the measured average value and completely ignore the variability.
This is a major problem that continues to persist because people confuse the precision of the observed effects (the ability to measure conversions during the test), with the precision of the describing the underlying system (the ability to draw conclusions and make predictions about your landing page visitor population as a whole).
You should not generally quote the observed improvement as a certainty. Even though you’ve observed an exactly computable conversion rate improvement percentage, you don’t know what it really is for your visitor population as a whole. Exact measurement of observable effects does not imply that you know anything about the underlying process.
By itself, the mean of an observed value can be misleading, especially at small sample sizes. The situation gets even murkier if you are trying to model two separate means (each with its own variance and noise). The situation gets downright ugly if you are trying to compute a ratio of such numbers. Yet this is exactly what is required to estimate a percentage improvement between two landing page versions.
I recently ran across a public case study of a landing page head-to-head test. Based on a sample size of 36 conversions out of 2478 impressions for page A and 65 conversions out of 2384 impressions for page B, you are told to conclude that the conversion rate improvement is 88%. Indeed, I would probably be happy with such a result. But let’s take a closer look.
Let’s assume that you want a 95% confidence in your answer. This corresponds to a statistical Z-score of 2, meaning that the number must fall within two standard deviations of the observed mean. If you compute the 95% confidence interval numbers on the number of conversions for both landing pages, you will find the following:
Page A: 36 ± 12 (the interval from 24 to 48)
Page B: 65 ± 16 (the interval from 49 to 81)
Let’s take a look at the best case scenario within our confidence range:
Conversions: A = 24, B = 81
Conversion rates: A = 0.97%, B = 3.40%
Conversion rate improvement: 251%
Now let’s take a look at the worst case scenario:
Conversions: A = 48, B = 49
Conversion rates: A = 1.94%, B = 2.06%
Conversion rate improvement: 6.2%
There is some rationale for reporting the conversion rate improvement based on the ratio of the means. Since more of the mass of the normal distributions lies close to the mean, the actual numbers are more likely to be near it. However, this should not be used as a reason to abandon the use of error bars or confidence intervals. Both the 6.2% and 251% conversion rate improvements are within the realm of possibility based on the confidence level that you had selected. There is a huge range of possible outcomes simply because the sample size is so small.
All online marketing educators are walking a fine line (myself included). We are trying to get at least a basic level of mathematical literacy across to our audiences. However, if the going gets too rough, many online marketers will just tune out and give up on the math altogether. I am somewhat torn. On the one hand, “half a loaf is better than none” and it is good to use some kind of statistical benchmarks. On the other hand, “a little knowledge is a dangerous thing” and can be easily misapplied during landing page optimization.
The bottom line is this: take the time and care to properly collect and analyze your data. When faced with uncertain measurements (basically all of the time), display them with error bars or confidence ranges.