Analytics Basics: Ratios and Averages

This article, written by Neil Mason, was originally published on Clickz.com on 04/06/10 and is republished here with permission.

ClickZ logoIn this short series of columns I’m going back to basics of some of the metrics that we typically use in our web analytics tool and highlighting some of the things we need to take on board when we use these metrics to measure online marketing performance. In some cases the issues are about the way that the metrics are constructed in other cases it’s about the way that we interpret them.

First of all let’s look at some of the common metrics that we might use to describe the overall activity levels on a site such “average pages per visit” or “average time on site”. There’s a real problem with the way these metrics are calculated which makes them virtually useless. Virtually all web analytics systems use the “mean” when they produce these “average” metrics. The trouble with using the mean is that it works best when the underlying distribution of activity is normally distributed like in the graphic below. This ties in with our own notion of what an average is in that we general assume that most people will be in and around this number. So if we say that the average time on site is 6 minutes and 30 seconds we implicitly assume that most people are on the site for around 6 and a half minutes.


The problem is that most behaviour on websites is not normally distributed. If you actually look at how many people stay for how long, or how many people visit how many pages, you will generally find a pattern that looks like the chart below:

Behaviour on the web is not “normal”. You will generally find that most people have relatively small levels of activity and that relatively small numbers of people have higher levels of activity. This is a “skewed distribution” and this means that the “mean” is not a very good measure of an average at all. What we need to be using is the “median”. The median is the point at which 50% of the population lie below this value and 50% of the population lies above. In my example above the median is 3 minutes which gives a very different perspective to the mean of 6 minutes and 30 seconds. It demonstrates that the activity levels are generally not as high as they would first appear.

The problem is though that web analytics systems don’t report the median, they only report the mean. The reason for that is that it’s easier to calculate the mean in a database than it is to calculate the median. So if you want to report a correct average you will need to run the relevant reports and then calculate the median yourself. Another problem is that reporting on “average behaviour” is not very useful. These days we generally don’t build “average” websites for “average” visitors. We tend to want to be targeted and so we want to understand the differences in behaviour amongst different groups of people rather than understand the average behaviour across all people. Averages might be easy to understand but they often don’t tell you anything useful unless you are looking in detail at the behaviour of various different segments. So avoid using site-wide averages if you can!

Another group of metrics that I think it useful to treat carefully are ratios such as conversion ratio and bounce rate. Ratios can be enormously useful as a metric but again when used on a site-wide basis they can ask more questions than they answer. There’s nothing wrong about the construct of a number like a conversion ratio, it’s more an issue of interpretation. The problems stems from the fact that a ratio (by definition) is the comparison of two numbers. In the case of a classic conversion ration this is typically the number of transactions or conversion events in a period compared to the number of visits. If the conversion ratio moves it’s because one or both of these numbers has changed. So immediately you have to investigate the change in both numbers and keep drilling until you understand the root causes. My main concern with the conversion ratio is that it’s generally used as a measure of overall site performance when it’s a measure of acquisition performance as well and you can’t separate one from the other. It’s possible that the site performance could be improving but the conversion ratio could be dropping because the acquisition performance (in terms of the quality of traffic being generated) is deteriorating. What’s needed is a series of “conversion ratios” that track the performance throughout the customer journey from the time someone becomes aware of the site through to the time they ultimately transact. It’s not that a site-wide conversion ratio is a bad metric, it just raises more questions than it answers and is more useful, again, when used at more granular and segmented level.

Add your comment