Testing

  1. Page 1 of 2
  2. Next

Do, or do not. There is no ‘try’

“Never try, never fail, those are the words I live by”, or so says Drew Carey’s character Crank in the animated kids film ‘Robots’. I heard these words coming from the back of the car a few days ago as I headed off on holiday with the family for a week in North Devon. You could run a business by that motto but I’m not sure it would last long or be an exciting place to work.

On the contrary,  it is the belief of both my team and I that we must try, that sees Foviance opening for business in China this quarter, with a new office in Shanghai. Read more…

Analytical web analytics

This article, written by Neil Mason, was originally published on Clickz.com on 14/01/10 and is republished here with permission.

ClickZ logoIn my last column I reflected on 10 years in digital analytics and how far the industry had developed in decade in some ways and how there was still room to grow in others. I commented that I thought that one of the issues was that the online marketing world had been “data rich and analytically poor” and this week I want to explore some of the areas where I think there is work to be done to enhance the quality of insight that digital marketers get from their investments in data capture and reporting technologies. Read more…

The simplest way is not always the best

A few weeks ago, Foviance was commissioned by a major UK retail bank to conduct user testing sessions on online application processes. The main objective of the research was to compare the newly designed process with the current one. And from there, stemmed a finding that challenges one of the most profound customer experience beliefs: the simplest way is not always the best. Read more…

Mobile Internet variety doesn’t match skill

It has been 14 years since I first surfed the web using a PC, so it is no surprise that it is increasingly rare for us to observe novice users of the Internet (yes, while they do exist, they are becoming fewer in number, and are less likely to volunteer to take part in research involving Internet use than more experienced Internet users). However, the same cannot be said for mobile Internet users. Read more…

Increasing value and conversion through multivariate testing

You might well have come across multivariate testing techniques before in your explorations into customer experience measurement, but for the uninitiated, here is a brief definition that puts the methodology into context.

Multivariate testing, or MVT, is an experimentation process by which a series of possible design variables are tested at once to see what effect, if any, they have on website performance. It’s a complex form of split, or A/B testing, employing algorithm-based software and constant monitoring of web analytics data. Small changes are made to single variables (such as the position of a menu, the colour of a background) and the impact of each change is measured. From series of changes, optimum design configurations can be narrowed down as a result of measurable evidence. With MVT it is also possible to experiment with structural, business rule and database driven elements, as well as cosmetic changes. We can even employ advanced rule-based targeting capabilities, including targeting by geographic location, traffic source (such as search engine versus email campaigns), cookies, and more.

Read more…

Should it be red or should it be blue?

We’ve all been there. Sitting round a conference room table discussing with our colleagues about the design of the website, the flow of a particular user path or the layout of a particular page. Opinions differ on what would work best, whether the call to action button should be red or blue, square or round, flat or bevelled. We all know best, because we’re experts. Aren’t we? In some cases it may not matter how expert we are, because the loudest voice will win or the most important person’s opinion will be the one that counts. Read more…

Model behaviour

In a recent article in Business Week’s Innovation section, Bruce Nussbaum investigated the impact that poorly executed, inadequately modelled and negligibly stress-tested financial instruments may have had on the ongoing global financial crash and pervading economic climate.Nussbaum captured the concept in a nutshell when he wrote: “Hundreds of hugely complex products based on hugely complex mathematic financial models were created and sold around the world-without first being tested out. There was little or no real-world iterative process… …In short, the innovation process was flawed. New inventions were not stress-tested in a real environment.”

It is obvious to draw parallels between this theory and how much effort our own industry puts into soft-launching and stress-testing online systems before unleashing them live on the wider community. Is it possible that similar attention to modelling by our investment banks and a reduced emphasis on getting to market as quickly as possible to reap the highest theoretical returns, might have avoided much of the mess we are now in?

In our experience, there are no shortcuts that can replace the benefits of thorough modelling and testing. We are experienced, working with financial service organizations and employ a range of financial modelling systems when creating products for that sector, regardless of the type and scale of banking application. We test, and retest with customers, conduct user surveys, and run real-world modelling. We listened to the top decision-makers from all sectors of global society at the annual Economic Forum in Davos back in January when they warned of just such an oversight, and we learned. Why didn’t the finance institutions and regulators do the same? Is it possible that they got ahead of themselves, bending over to product guys in order to reach a perceived sweet market as rapidly as possible, rather than following a risk-averse approach?

We work with high-profile financial clients like Barclays to ensure their online customers are provided with easy-to-use, highly secure, no risk products. Of course we are somewhat fortunate in that internet service modelling is logical and predictive – thanks to artificial server loading techniques we can run scenarios that see services oversubscribed by 100 percent and so on. But we also run pilot schemes, test groups, live tests, plus continuous testing and modelling post launch. We find real users to test products, and we ensure they are able to deposit and withdraw real funds long before products reach a wider market.

It appears that the investment banks we all rely upon simply skipped all these logical steps, going straight to market with poorly thought through products. Take the US public credit situation – if loans to citizens had been thoroughly modelled, it is probable that the disastrous toxic loan situation could have been avoided altogether. It’s important to ask the difficult questions – what if 20 percent of citizens can’t pay their loans back? What happens then?

Perhaps it is true that if organisations take the time to research, and model critical products and services carefully and thoroughly, they might miss out on early financial opportunities from time to time. But surely these steps should be considered vital, if not mandatory, to ensuring a solid, risk adverse financial landscape?

The more the merrier?

When it comes to deciding how many users to recruit for user testing, nobody seems to agree on an ideal sample size. Perhaps more precisely, nobody actually seems to know. This is probably because user testing straddles two seemingly antagonistic domains: business and science.Whenever a client asks me how many people I think will be needed for a particular project, the first thing that comes to my mind is the dreaded: “It depends”. In practice, I generally opt for the rather less elusive response: “Let’s talk a bit more about the tasks before we decide.” In truth, when it comes to sample size, “it depends” is probably more accurate!

First and foremost, sample size is dependent on the type of study. There are a few voices both in the usability and the academic worlds preaching about the ideal number of users in a standard usability evaluation. It is generally agreed that we get value for money with five to eight participants, because on average somewhere between 80 percent and 85 percent of problems are identified using those sample sizes. To unravel closer to 100 percent of problems we would need perhaps twenty people. The maths is simple: why spend 150 percent more on recruitment to get 15 to 20 percent more in terms of results?

There is, of course, some variation in these numbers, but generally, when it comes to standard evaluations, it is fairly easy to decide on sample size. Rich data is extracted, behaviour is observed and interpretation of the results relies on known best practice and experience. Things get a bit more complicated, however, if we use quantitative measures of behaviour, such as eye tracking or quantitative survey data. As someone with a lot of experience in eye tracking, I often get asked how many participants to recruit for such projects, and invariably, people are once more in danger of hearing the dreaded: “It depends”.

Clearly, sample size is related to the complexity of interfaces and tasks. The more complex, the more people we need to test as data variability increases. But crucially, sample size depends on the behaviour the test is set to measure. This, in turn, depends on what the objectives of the study are. For example, to know whether an advert is going to be noticed when users perform their usual tasks on a page, 20 people might be required. However, to know how long on average it takes people to look at the ad, more people are needed because of the huge variations between participants.

With surveys, sample size estimation is also somewhat less straightforward than with standard usability evaluations. Here, the information being collected is attitudinal data, which by its sheer nature can be slightly fuzzy. It all comes down to the size of the effect you intend to detect. Imagine you wanted to know whether people in London are taller than people in New York. If people in London and people in New York are actually pretty much the same height, you will need to measure a high number of citizens of both cities. If, on the other hand, people in London were particularly tall and people in New York were shorter than average, this will be obvious after measuring just a handful of people.

What sample size does not depend upon, is the size of the original population. Whether we are testing people that belong to the whole population of Europe or teenage boys that only wear Ecko clothes and speak with a South London accent, the factors weighed to estimate sample size should be: interface and task complexity, sensitivity of measure and effect size, and the variability between the users.

Of course, in any case, the more the merrier, but this is only possible in a world where resources, such as time and money, are infinite. In the real world, we compromise, and the trick is in being able to achieve a good balance between rigour and value.

  1. Page 1 of 2
  2. Next