Data Mining

  1. Page 3 of 3
  2. Previous

How to do Predictive Analytics – Part 1

This post originally appeared on Applied Insights’ blog. Foviance acquired Applied Insights in November 2008, with Neil Mason joining us as Director of Analytical Consulting. As part of this acquisition, we’ve incorporated Applied Insights’ blog into our own.

Having finished the last blog by arguing that more people could be benefiting from Predictive Analytics this begs the question of where to start if you are not doing it already.

Is it just a case of buying a predictive analytics software tool and ‘plugging it in’ – Probably not. Nor do you necessarily need to hire expensive contractors/consultants, unless the need justifies it.

As you might expect part of the predictive analytic process is to decide what the potential cost/benefit of the activity would be, and indeed whether predictive analytics as an approach has any chance of meeting the business/research objectives you have in mind. The good news is that there are some pointers to help make those decisions.

A process template for undertaking a predictive analytical project

As I mentioned last time there is a significant overlap between the types of analytical activity described as ‘Data Mining’ and those which are also termed ‘Predictive Analytics’. For that reason we feel that the process involved in executing the former (and for which there are existing process blueprints) are currently the best templates for undertaking the latter.

There are probably 2 leading process models at the time of writing.
1. SAS have their own called SEMMA.
2. A cross-industry forum including DaimlerChrysler, SPSS, NCR Teradata and others have developed CRISP-DM.

Generally speaking these models cover the same ground, and or not unlike many consulting project engagement models. We tend to use CRISP-DM as the basis for our work for 2 main reasons:

  1. Our sense is that more collaborative thought has gone into producing a more detailed template.
  2. It is somewhat broader in the sense that it covers the business objectives and the ultimate application (deployment) of the outcomes (e.g. models, scores, etc.) in more detail.

A visualisation of the 6 steps in the CRISP model can be seen here.

Simply put they are:

1. Business understanding

Starting with a business goal (or goals) – e.g. reduce the rate at which my customers are defecting -this is the crucial step in which we take those objectives and begin to evaluate the business context before embarking (or in some cases deciding not to embark) on the analytical process.

2. Data understanding

The second understanding step is to audit and investigate the data in the various data sources which can potentially provide the grist for the analysis. This covers both a top level analysis of the metadata and a deeper, exploratory, analysis of the data.

We typically see the two understanding steps as part of the same phase. Once complete you should be in a position to evaluate what is likely to be achievable from a modelling perspective. Most often we find that there is enough potential to continue with the modelling, though in many cases the project may turn out to be somewhat different to the original expectations. In a small minority of cases there may not be that potential, or we may feel that it is too costly/risky to undertake the prospective analysis.

3. Data preparation

This is the necessary, but arguably the least interesting, step – unless you like this kind of thing!. Sometimes described as the ETL (Extract, Transform and Load) stage this is where we beat the data into shape by importing it into an appropriate format for the target analytical tool(s) chosen in the understanding phase. In our experience this is the step that takes a bigger chunk of the project than one might expect.

4. Modelling

This, and the next step which is strongly linked, is the crux of the process. This is where we apply one or more – usually several – appropriate modelling techniques to the data. We shall talk more about user interfaces to models and software tools later but model selection ‘usually’ requires a level of expertise to identify appropriate modelling techniques which fit the shape of the data (e.g. some modelling algorithms require input data to be normally distributed).

5. Evaluation

Quite simply how well does the model perform. This may be as straightforward as looking at the percentage accuracy of the model predictions against an unseen test (‘holdout’) sample. It could be about evaluating how the model performs more sophisticated scenarios related to profitability, or the risk of investigating too many non-fraudulent credit card transactions and annoying too many loyal customers.

6. Deployment

The whole point of the exercise is to apply the results of the analytical process in a way which creates benefit going forward. Broadly speaking this takes 2 forms.

  1. It could be about simply making decisions based on the insight generated, e.g. deciding to open a new retail outlet in a location which scores highly from the perspective of market potential.
  2. Or it could be about integrating the results data (e.g. propensity scores), or even the model, in a way which can automate operational actions. For example we might embed an on-line advertising click fraud detection model in out web analytics process to send/report alerts when potentially malevolent transactions are generated. Or we might simple generate a list of new customers who were scored highly by a model which predicts lifetime value, but who we believe need to be engaged early in their lifecycle to meet that potential. Such a list can generate call centre actions or marketing campaigns (and the model may also indicate which is the more appropriate).
    • Small scale: I have a spreadsheet with the last 5 years sales in it I wonder if I can predict sales for the next 3 months? In a sense this puts the cart before the horse; having data sparks a potential business objective which we might not have thought about. The more explicit objective could be to meet sales targets. The whole exercise (up to deployment at least) could take less than a day.
    • To the larger scale: Can I identify new customers who have the potential to be the most profitable in the future? Against a business objective of growing customer profitability. This example starts with the business objective in the regular way but may require more convoluted merging of data from various databases, potentially related to customer acquisition from different channels, products, divisions, countries, etc. A project of this kind could take weeks, and sometimes months, to complete.
  3. One important note is that – as you can see from the CRISP diagram – this isn’t necessarily a linear process. For example we might find gaps in the data in the preparation step that lead us to re-evaluate our understanding/objectives, or alternatively we find that the data we are modelling may have issues which can be resolved through new transformations (e.g. imputing missing values) as part of a new preparation step.

    It may look quite heavy, but it doesn’t have to be. In our experience the process model can range from:

    What happens in practice?

    I’ll use the next few blogs to walk through the process model and give some pointers to what we’ve found to be important on engagements. I’ll try and be candid enough to identify areas where we’ve had success (or otherwise), where things can go wrong, and where we think there are limitations in the methodology.

So what is Predictive Analytics?

This post originally appeared on Applied Insights’ blog. Foviance acquired Applied Insights in November 2008, with Neil Mason joining us as Director of Analytical Consulting. As part of this acquisition, we’ve incorporated Applied Insights’ blog into our own.

Over the past few years we’ve all been hearing more and more about ‘Predictive Analytics’ (PA for short). It is one of those terms – like Business Intelligence (BI), Customer Relationship Management (CRM), Enterprise Resource Planning, (ERP) etc. – that once coined, captures the essence of an area of business/research activity and gets a life of its own – particularly as software and services vendors use it as a means to more easily describe and, of course, sell their products and services.

A simple definition of predictive analytics

A simple definition of predictive analytics is that it’s an activity which allows us to quantify future events or actions. This quantification could be as straightforward as generating a list of customers who are likely, at a point in time, to behave in a certain way, e.g. to churn, to register, to buy or to respond to a particular mail, etc. Typically this list will be accompanied by a score which gives us the probability (or propensity) that an event will occur.
Alternatively the technique may give us a value, or set of values, e.g. a set of sales forecasts for a given product line in the coming week; furthermore the forecasts are often presented with confidence values, e.g. we might predict that we will sell 50 widgets on Monday but we can say, with 95% confidence, that we will sell between 45 and 55.

So what’s new?

One of the main characteristics of PA is that many of the tools, techniques and applications which it comprises are not actually particularly new. Credit scoring is one of the most well-known applications of PA. and credit scoring has been around for over 50 years.

Wikipedia details many of the other uses and techniques which can be described as ‘predictive’ though the current list there is by no means exhaustive. You will probably recognise a lot of them. In fact many of them come from the world of Data Mining and there is some significant overlap between Predictive Analytics and Data Mining, but there are also many differences; more on that later.
So is ‘predictive analytics’ just a new bottle for a lot of old wine? One of the new things is that there is an abundance of relatively new technology which can:

  • Take PA to a wider audience by making the often complex algorithms more usable for less statistical/technical users.
  • Provide access to a broader range of techniques through smarter user interfaces which map more closely to the analytical process in such a way that modellers (analysts, statisticians, data miners, demand forecasters, etc.) can more productively access data , test and develop models and ultimately deploy the best ones.
  • Allow the results of PA (e.g. customer lists, scores, models, etc.) to be used more easily in decision making processes going forward.

For example, PA is being used to enable customer services representatives in a call centre to prevent customers churning with appropriate offers. It’s also be used in web site recommendation engines which can serve up relevant content based on what is understood of a visitor’s needs, preferences and previous browsing/buying behaviour.

This last point could mean generating models in a format (such as PMML) which can be more easily plugged-in to an operational process or integrating PA into other tools like CRM platforms.

There are also many new techniques and new applications for those techniques. The theories around Naïve Bayes and Robust Regression, for example, may have been around for a while but it is only recently that they have been available in an accessible commercial format. Techniques to automate the search for the best fitting time-series model are also quite contemporary. Applications such as SPAM detection and multivariate testing to optimise on-line marketing campaigns and page content are arguably the latest in a long line. Even if your potential application doesn’t seem to feature in any list of previous ones it could be worth exploring the potential that PA offers. There is a first time for everything!

So what can predictive analytics do for me?

PA is more commonly applied within organisations to address specific issues like preventing customer attrition or to identify segments of customers who will be more responsive to specific campaigns. One of its advantages is that it is usually possible to demonstrate potential ROI through the model, and if the model is a good one, the actual ROI when the model is deployed should not be too far from the prediction.

Increasingly though these techniques are being used more strategically to inform a range of organisational decisions and actions. Effective CRM programmes, for example, often use PA in a number of ways to anticipate customer needs and behaviours across channels and at different points in the customer lifetime.

So more sophisticated applications will not only identify good prospects for acquisition they will help define a series of interactions as the prospect develops into a loyal customer which enhance the customer experience and ultimately drive mutual benefit.

Getting more into predictive analytics

Despite the publicity I feel that applications of predictive analytics, particularly in business, are only scratching the surface today. There are a number of reasons for that; among these is the age old one that there is a gap between the available tools/techniques and the appropriate circumstances for business adoption. There is also a dearth of resource to help bridge that gap!

In future blogs I’ll explore predictive analytics in detail; looking more closely at the tools, techniques, applications and vendors. I’ll give a view on the process of applying PA – particularly aimed at those who haven’t started yet. I’ll also look beyond the hype into cases where it has been used to achieve significant results.

I’ll even discuss how PA relates to BI, CRM and ERP. Though we shall try and do that in a more understandable way than I did in that last sentence!

The Analyst’s Toolbox: Decision Trees and other classification techniques

This article, written by Neil Mason, was originally published on Clickz.com and is republished here with permission.ClickZ logo

In my last article I had a quick look at some other tools for the analyst toolkit other than their web analytics system. These included Business Intelligence or OLAP tools, visualisation tools, statistical analysis and data mining tools. This week I want to take a deeper look at the use (and possible abuse) of statistical analysis and data mining techniques.

Statistical analysis and data mining covers a wide variety of approaches, methodologies and techniques that might be useful for the web analyst. They can be broadly be classified as follows:

  • Statistical analysis
  • Classification techniques
  • Clustering and segmentation methodologies
  • Forecasting
  • Text analysis

It’s probably best to start with a note of caution. There’s a saying “If you torture the data long enough, it will tell you anything you want it to”. These kinds of data analysis techniques can be very powerful and they can be used to uncover nuggets of gold in your data. They also need to be used carefully. The analyst needs to ensure that the results are robust, reliable and above all make sense. Data mining is as much an art as it is a science.

Simple statistical analysis techniques such as frequencies and histograms can reveal interesting patterns in your data. I’ve written before about the dangers of using averages metrics such as “average pages per visits” as they hide interesting differences in behaviour. Worse than that, they can actually be misleading.

Often in the work we do, we will spend a lot of time initially carrying out exploratory analysis looking at the patterns and distributions in the data. It’s time well spent. It gives you a feel for what is going on below the topline metrics and also helps later when you begin to look at the results of other analytical techniques. As a marketing analyst you need to have a sense of how the data is made up, how the topline metrics are constructed and where they come from. For example, you may find that there are some extreme values or “outliers” that might affect your results and so need to be dealt with in some way or another.

With statistical analysis you may want to compare different groups of visitors or customers. For example, looking to see whether the repeat order rate is higher amongst some groups of customers than others. You can apply statistical tests to see whether any differences are real significant differences or whether they just might be because of the variability in the data. Significant difference testing can be important in experiments such as A/B tests to ensure that “A” is really better or worse than “B” before making any changes to the site.

There are many different types of “classification” techniques including regression analysis, often used in credit scoring, as well as Articicial Intelligence approaches including neural networks. The class of techniques that I want to take a look at today is the use of “decision trees“. There are a number of different algorithms in this type of technique including CHAID, CART and QUEST. These algorithms essentially do the same thing in different ways and that is to assign the data records (such as visitors or customers) into groups of interest based upon the other variables that you have on the record.

For example, you may have records on customers that splits them into two groups: “single order customers” and “repeat customers”. You may then also have a whole string of other data on those customers and you are interested in understanding what are the key characteristics that distinguish between someone who orders once and someone who goes on to order again. Decision Tree methods will look at all the other variables and determine which one is the most important factor in determining the difference between a single order shopper and a repeat order shopper. It then repeats the process again and gain until it has determined what all the significant factors are in order of priority.

The great thing about decision trees is that the output is very visual and relatively easy to understand. They can get a bit big and cumbersome though especially if you are dealing with a lot of variables. Decision Tree techniques have been used for years in direct marketing work to determine which type of people are most likely to respond to mailings, so that companies can cut down on mailing costs.

In online marketing, mailing costs isn’t such as big issue as it is in the offline world but we have used techniques like decision tress in other areas to understand what the factors are that influence visitors to do something or not. In the example above of single order customers vs repeat order customers we did a piece of work where we looked at many potential factors that included:

  • the size or the first order
  • the number of visits to the website after the first order
  • the product category of the first order
  • the product categories browsed after the first order
  • whether they were opted in to the email newsletter
  • how many newsletters they had received
  • the timing of the newsletters after the first order

We found that the most important factor in determining whether someone went on to order again after their first order (out of all the ones we examined) was that someone had opted into the email newsletter and had received a newsletter within 5 days of that first order. Vital input into a retention marketing programme.

Decision Tress techniques are also useful for profiling and understanding different segments of visitors or customers. Segmentation techniques are what I will be looking at in the next part of this series.

Till then…

The Analyst’s Toolbox: Introduction

This article, written by Neil Mason, was originally published on Clickz.com and is republished here with permission.ClickZ logo

There is a tendency when we talk about analysing web data that we focus on the use of so-called web analytics tools such as Google Analytics, Omniture, Coremetrics and the like. These analysis tools were developed specifically to manage the challenges of managing the reporting and analysis of data collected from web sites but they aren’t necessarily the only tools we might have in our toolbox.

There are a variety of other reporting and analysis tools that we might want to use on the data from our web sites to get a better understanding of online business performance and customer behaviour. It is fair to say that web analytic systems have significantly improved their analytic capabilities over the past few years and will no doubt continue to do so. These days there is a far greater ability in a number of the systems to be able to filter and segment data on the fly to look at the behaviour or characteristics of particular groups.

However, as the needs of the organisation continue to develop so too might the need for different or specialist reporting or analysis tools. Other systems for reporting and analysing web and customer data can be grouped into three broad categories:

  • Business Intelligence (BI) or OLAP tools
  • Visualisation tools
  • Statistical analysis and data mining tools

BI or OLAP tools are often found in the corporate reporting environment and this class of tools includes systems such as Business Objects, Microstrategy and Cognos. Databases such as Oracle and SQL Server also either come with BI functionality or it can be bolted on. Underpinning many of these tools is the concept of a data cube that allows the analysts to drill through the data in a hierarchical manner. In a commerce environment I might start looking at say at total sales for a year and then drill down into product categories, then into sub-categories and then down to the product level.

Some web analytics systems do have the ability to drill through data in this way but a feature of the family of BI tools is the ability to handle multiple hierarchies across multiple dimensions. So, in addition to being able to drill through on the product dimension, you can also drill through the data say in terms of geography and also time. BI tools could also be used to report on web data in the context of other channels, for example comparing the profile of leads or enquiries generated online against those generated in the call centre.

As the saying goes, a picture tells a thousand words and visualisation tools can be a valuable weapon in your analytical arsenal. Again some web analytical tools such as Visual Sciences and Site Intelligence have some powerful visualisation capabilities. Whilst many web analytics systems have improved the visual reporting of web data through developments of click overlays for example, for the analyst a visualisation tool might add another dimension.

Visualisation tools can range from add-ins or add-ons for Excel through to complex applications that are commonly integrated in with data mining tools. At the desktop level, Excel add-ons such as MM4XL extend the scope of the charting abilities of Excel and allow the analyst to present data in different ways. More sophisticated tools can produce three dimensional rotating images that allow the analyst to explore and look for patterns in the data. The human brain is still one of the most powerful tools available for spotting patterns and trends in data when presented in the right way!

The final set of tools that might be useful for analysing web and customer data are statistical analysis and data mining tools. What’s the difference between statistical analysis and data mining? The way that I tend to view it is that statistical analysis is predominantly about exploration and data mining is about discovery. With statistical analysis you are often looking to test an assumption or a hypothesis. For example, you may be looking to prove that one group of customers rate your product or service more highly than others. With data mining, you are looking for patterns or relationships in the data that you may not know about.

Statistical analysis and data mining covers a wide variety of approaches, methodologies and techniques that might be useful for the web analyst. The can be broadly be classified as follows:

  • Statistical analysis
  • Classification techniques
  • Clustering and segmentation methodologies
  • Forecasting
  • Text analysis

Increasingly many of these techniques are being used for making predictions and so the phrase “predictive analytics” is a term that is often used as well to describe these various methodologies.

Some of this stuff may seem like a long way from the current day to day analysis of conversion funnels and the like. But as the market continues to mature and growth comes from optimisation and improvements in marketing efficiencies, some of these techniques will have a place on the analyst’s workbench. Over the next couple of weeks, I will take a look at some these techniques in more detail and how they be used in the context of analysing online visitor and customer behaviour.

A segmentation primer

This article, written by Neil Mason, was originally published on Clickz.com and is republished here with permission.ClickZ logo

One of the things that you hear being talked about a lot more about these days in the wacky world of web analytics is “segmentation”. But I sometimes wonder what people mean when they talk about segmentation. I think it’s one of those words that is used more often than it is necessarily understood. Understood in the marketing sense of the word anyway.

I’ll take one example. One of the largest and most successful web analytics systems vendors has a section in their report menu called “Segmentation”. What we actually find there are reports on the most popular pages and sections of the site. I’m not too sure what that has to do with segmentation. Other vendors talk about segmentation as well but mean different things. Sometimes they talk about the ability to filter along different dimensions or the ability to analyse the data by combining different variables. So, segmentation could mean reporting particular data, filtering data or analysing data. All of these things are good things, and potentially even useful things, but are they segmentation?

I dug out some of my marketing text books to see if there was a consensus view in them about what segmentation actually is. I found that what they tend to talk about is that segmentation is a means of identifying different groups of people in order to develop different strategies for each group. So, segmentation is a purpose rather than an outcome and I think that’s the difference between classification (which is what a lot of analysis tools do) and segmentation which is what marketers or marketing analysts do.

The point of segmentation is that you do something as a result of having it. For example:

  • You target different groups of people with different messages in your acquisition campaigns
  • You present a different site experience dependent on your understanding of who that person is
  • You interact with different people differently dependent on where they are in a customer lifecycle

In one of the books that I looked at that was actually written 20 years ago, the authors described three conditions of a good segmentation*. They are:

Homogeneity – the degree to which people in the segment are similar in ways that is interesting to you

Parsimony – the degree to which the segmentation would make every person a unique target

Accessibility – the degree to which you can describe the segments in ways that help you deploy differentiated marketing strategies

That all sounds pretty theoretical (well, it was a text book), so what does this mean in practice?

My interpretation of this is that a good segmentation has to be robust, useful and actionable. There are many ways that you might segment say a site’s visitors or your customer base from simple classification approaches through to complex statistical techniques but they have to pass the sense check of being robust, useful and actionable.

You might simply classify according to some demographic or geographic variables. For example classifying the customer base between male vs female is a form of segmentation but it is only robust and useful if men and women exhibits differences that are potentially useful to you and only actionable if you can realistically target them in different ways.

Alternatively, you might develop a segmentation based on some attitudinal variables. Many years ago I was involved in a project where we segmented the visitors across the number of different sites we had in Europe according to their attitudes to online shopping and their motivations for visiting the site. Whilst the results were certainly interesting and highlighted some interesting differences in the visitor profile of the different sites, we had to question how useful it was to us. How were we going to action the insight? We couldn’t identify and classify people arriving on the site by their attitudes nor could we easily use it in our retention marketing activities as we didn’t have people’s attitudes stored on our customer database.

So, I think that there is always a balancing act in satisfying those three conditions of homogeneity, parsimony and accessibility in a good segmentation. In our own work, we tend to use behavioural segmentation approaches as it makes it easier to act on the outcomes. This may often involve using statistical methods such as cluster analysis to segment customers into groups that are distinct from each other in a meaningful way like their browsing behaviour or their purchasing behaviour.

However, we are also mindful of the ability to the client to be able to act on the results. There is no point in developing a sophisticated methodology that identifies some really meaningful segments if there is neither the skills nor the tools available to realise the opportunity. For example if your email tool is not easily integrated into your customer database then it’s going to be difficult to execute improved target marketing initiatives. It is best to start with something simple and develop the capabilities to act in line with the development of the insight itself.

As it’s getting to that time of the year, in my next article I will be taking a personal look back at 2005 and reflecting of the some of the key events from my perspective and trying to get a sense of where we may be heading in 2006.

* “Marketing Decision Making – A model-building approach” by Gary Lilien and Philip Kotler

  1. Page 3 of 3
  2. Previous