Predictive Analytics Part 1
This article, written by Neil Mason, was originally published on Clickz.com and is republished here with permission.
In my last article I outlined my belief that what we call ‘web analytics’ is becoming a more diverse and complex field. What we have traditionally considered to be web analytics has been the analysis of site behavioural data captured, processed and reported on by a proprietary system designed to do just that. But as the online channel evolves and becomes more complex , the tools to help us understand what’s happening must also evolve and become more complex. In some areas, such as in the case social media, this may mean the development of new tools. In other areas it may mean the application of old tools to this new channel.
One of the areas that we work in a great deal is in the use of data mining and predictive analytical techniques. I first got started in this area about 15 years ago when at ACNielsen using these types of methodologies to help clients to try and figure out which half of their advertising money they were wasting. I have a book on my bookshelf that was published 25 years ago on the use of model building techniques in marketing. So the techniques aren’t new but what is relatively new is the systematic use of these techniques in the online marketing space.
I think that there are some reasons for this. Historically our main concern has been on managing the vast volumes of data and wrestling out of the web analytics systems a few numbers that told us how well we were doing and that we could do something about. Also, in the past, the natural organic growth in the channel has meant that we have not been faced with the need to scramble for market share and to fully optimise our business processes. And to some extent, we have not been asking the right questions. This is now changing. We understand our few numbers and we want to know more. The online world is far more competitive and we are beginning to ask questions that go beyond the limits of our traditional analytical tool set. Questions like:
- How do I understand the effects different marketing channels have on generating sales?
- What does the purchase lifecycle look like over multiple visits and how can I optimise it?
- How should I be segmenting my audience or customers, to improve the effectiveness of my marketing activity?
To answer these types of questions we are going to have to start to organise the data in different ways and we need to bring in some different tools. First of all we need to integrate our data so that we can see different aspects of the acquisition, conversion and retention processes in one place, Secondly we need to aggregate our data so that its focuses on the visitor or customer rather than the click or the visit. Thirdly we need to cut through the noise in the data using more sophisticated analytical techniques to get at the key insights. Let me give you an example of what I mean.
We all know that different types of people come to our websites for different reasons and to do different things. If I treat everyone the same, I am being sub-optimal in my decision making about how I allocate marketing funds and about how I manage the user experience. I need to segment my audience so that I can market to these different groups more effectively. However, I can’t do that on the basis on how they behave on the website alone, I need to also understand their demographics, their intentions, their aspirations and their opinions. So I need to integrate my hard core behavioural data with profiling and attitudinal data drawn from other data sources like surveys.
Next, I am interested in the behaviour of visitors over multiple visits rather than what they do in a single visit. So I need to aggregate the data so that I have a record of the behaviour of different visitors over a period of time. Also I probably need to summarise the data and create additional attributes which describe aspects of that behaviour over time such as number of visits made, number of conversions events, types of conversion events and so on.
Finally, I need to analyse the data to identify interesting and meaningful segments of visitors. In all likelihood I will probably have quite a large and noisy dataset where I won’t be able to see the forest for all the trees. Traditional querying and reporting techniques are unlikely to be an effective method of identifying the patterns, I need to use something that will find the patterns in the data for me. In this case I decide to use cluster analysis. The cluster analysis process looks for groups of visitors in the data, where the people within the groups have something in common but what they have in common is different from group to group. What I have to do then is interpret that data to understand what it is the visitor segments have been clustered on and decide whether these are meaningful and useful segments that I can do something with. This process may yield some surprising results and enable to think about the audience in a way that I had not previously thought of them before. I may find patterns and relationships in the data that I would never have found using traditional analysis techniques.
So using data mining and predictive analytical techniques will allow organisations to unlock more value from their data but it requires a different approach to managing your data, different tools and different skills. Next time I will look at another application of data mining and predictive analytics; to understand what are the important factors are that affect someone’s propensity to buy something during the purchase lifecycle.
Till then…
The increasing diversity and complexity of ‘Web Analytics’
This article, written by Neil Mason, was originally published on Clickz.com and is republished here with permission.
This week I’ve been starting to get my head around the presentation that I will be giving at the Emetrics Summit in Washington in October. I was looking at the agenda and was struck by the vast breadth of material being presented over a full three days, with 11 different tracks and 3 workshops. I consider myself to be a bit of an Emetrics veteran (this will be my eighth) and it used to be considered exciting when we split up onto separate tables for an hour or so in the conference room to discuss different topics. Now we can go to a whole track on it and not see each other for 3 days except at the networking events.
The Emetrics Summit is a bellwether of the web analytics industry. It’s not just the growth in the size of the conference that reflects the dynamics in the industry but also the increased diversity and complexity of the subject matter and the content. At the Washington Summit there are tracks on subjects ranging from Marketing Optimisation to Public Sector measurement to Web 2.0 analytics. This shows how the industry is developing in lots of different direction and effectively what we are seeing is the emergence of different disciplines within what we call ‘web analytics’ and I believe what we will see is the emergence of different specialists within these different disciplines. I expect it won’t be too long before practitioners and consultants within the space will find that they cannot cover all the ground and in common with other marketing services industries (ie market research, direct marketing, PR etc) we will see specialisation increase.
For example, take the development of Web 2.0 and social media. The Web Analytics Association has recently set up a separate committee to look at this whole area, as ‘traditional’ approaches to web analytics are not suited to measuring and understanding the impact of this evolving medium. It’s likely that as social media continues to develop that different measurement tools will evolve, perhaps requiring different skill sets to analyse and interpret the data. I draw a parallel with the market research industry that I worked in for a few years; you had people who were essentially skilled in ‘quantitative’ analysis and those that were specialists in ‘qualitative’ analysis. Few could do both well.
The track that I am speaking at in Washington is another case in point. The ‘Statistical Success’ track is a new track to the Summit. Whilst that sounds pretty scary even to a bunch of web analysts, again it’s an indicator of the development and maturity of the industry. The track includes a number of presentations that talk about the use of various statistical and advanced analytical techniques in evaluating online marketing performance. This is relatively new to web analytics but it’s not new to consumer analytics. The direct marketing industry, for example, has been using advanced analytical techniques such as regression analysis, decision trees and so on for years to predict likely response. The market research industry has been using techniques such as cluster analysis to identify and understand different consumer segments.
Now these techniques are being used to help understand more fully different aspects of visitor behaviour on websites and the effectiveness of online marketing campaigns. Techniques such as multi-variate testing and behavioural targeting are statistical processes that have been productized and packaged up into services by companies such as Optimost, Offermatica, Touch Clarity and the like.
What we are also seeing is statistical analysis, data mining and predictive analytics being deployed in an ad-hoc way by analysts skilled in these techniques using tools such as SAS, SPSS, KXEN and the like. These packages have long been an essential tool of the ‘offline’ marketing analyst and now they are finding their way into the online marking analyst’s tool box as well. In my presentation at Emetrics I will be looking at these advanced analytical techniques in more detail and how they can be applied in online marketing analytics. Since not all of you are going to be making it to Washington (I assume), its something that I’m also going to be covering here over the coming weeks.
How to do Predictive Analytics – Part 5
This post originally appeared on Applied Insights’ blog. Foviance acquired Applied Insights in November 2008, with Neil Mason joining us as Director of Analytical Consulting. As part of this acquisition, we’ve incorporated Applied Insights’ blog into our own.
Steps 4 and 5 – Modelling and Evaluation, The Theory
Now for the serious stuff (or the fun stuff depending on your inclination!). Of course the modelling phase is at the core of a predictive analytic effort. CRISP rightly separates modelling and evaluation into separate steps which emphasises the importance of the latter. However they are intrinsically linked and we will consider them both together here.
As this is really the central issue I’ll break into 2 parts. Let’s talk about the theory of how we go about it and the in the next blog entry I’ll try and ‘make it real‘ with a practical example.
Just to recap on how we got here. Starting with a research or business objective we’ve garnered enough understanding to embark on a predictive exercise. Furthermore we’ve explored the data and found predictive potential. More than likely we’ve uncovered enough relationships in the data as we explored it to indicate that patterns exist which will allow us to predict the outcome(s) of interest.
So who can do this?
Traditionally predictive modelling has been the domain of the expert. The statistician, mathematician, econometrician, the numerate researcher or the more expert ‘analyst’, etc.. This is still largely the case today but we are seeing increasing signs of analytical democratisation.
Some of the contemporary tools discussed below do require less expertise to develop models today because of smarter user interfaces. There has been a move to more automated algorithms like decision trees where the analyst does not need to know as much about the data structures or the requirements/assumptions of the algorithm to specify the correct analysis. More traditional statistical methods, like Regressions for example, do require the analyst to understand the technique well enough to specify the right settings/options and to follow certain rules about the data; e.g. that the input variables are not too highly correlated (i.e. ‘multi-colinear’). Methods and algorithms from the world of Artificial Intelligence e.g. Neural nets, and the trees are generally more tolerant of different data patterns and have fewer options for the analyst to worry about.
Nevertheless it is rarely the case that we can just ‘press the button’ without a certain level of expertise in the analytical tool and/or the handling of data. But with a few days training most business and research users should be able to run models even in the most advanced tools. More specifically developed Analytical Applications can often provide a higher level of accessibility to deeper analytical methods for broader, less expert, audience.
And how?…
For heavy duty predictive modelling the analyst will typically have an arsenal of predictive tools and algorithms at his/her disposal. We’ll revisit the various tools/platforms later but the vendors who probably offer the most are SAS and SPSS. Though there are some, relatively, new entrants making headway such as KXEN, Salford Systems and Think Analytics. See the Gartner Magic Quadrant for Customer Data Mining for one view of the landscape of predictive software tools.
In the last step we spent some time ensuring that the data was in the right shape for this step. Hence, in the simplest sense the modelling process itself is just about defining the input and output variable(s) of interest and building and evaluating multiple models.
Which method to choose?
In part of course this will depend on what you have available. If you only have Excel then, without purchasing an add-on like XLMiner, you have access to the models available in the Excel statistical pack. As I mentioned in earlier blogs if you are entering the predictive arena for the first time you may want to consider some of the freely available software, particularly R. The caveat to this is that, as I write, you need to be able to learn the R language to drive the models. I am not currently aware of any particular user interfaces that help accelerate the usage. Despite that initial technical hurdle R does offer a very impressive range of modelling algorithms. Alternatively you may have one, or more, of the toolsets from the Gartner Quadrant mentioned earlier.
We should probably try as many of the appropriate candidate models as time allows. Some – particularly those that come from classical statistics (see the earlier point) – may not be appropriate because of the shape of the data so may be rule out. Going in, especially with new data, it is usually difficult to know which type of model will give us the best predictions . From experience analysts may like to start with methods they know have produced the best models with what feels like similar data.
So what is a model?
The different types of algorithms construct models in different styles but at the most abstract level a model defines a pattern, or relationship, between the input variables and the output (outcome) variables. A Statistical regression model, for example, will use a mathematical formula to achieve this. A Decision Tree/Rule induction model will produce a tree or a set of rules to characterise the relationship. Whereas a Neural Network model will typically build a more opaque view of the relationships by connecting an abstract network of nodes, links and weights to encapsulate the underlying pattern.
The core train/test process
One of the beauties of predictive analytics is the way in which we construct a simple experimental structure which allows us to test (validate) models on unseen data. The empirical approach, if it is done properly, gives us a pretty good approximation to how the models will perform when deployed in a live setting on new data. For example, let’s say we have a data set from a period in time when we know which customers churned or stayed. We would typically model a customer’s likelihood to churn on a subset (60% say) of that data and then test it on the other 40% to see how well the model predicts churn. If the accuracy is good enough (and that depends on the success criteria that we defined) then … if all other things are equal and we had constructed a representative enough data mining table … then we would expect similar results if we use the model going forward in a live setting. Usually this means that we randomly split the data into two subsets
-
The training subset is the one use to build the model (the 60% in the churn modelling scenario described above).
-
The testing subset is the one we use to evaluate the model (the 40% for the above scenario). This second set is used to effectively simulate what we want to do in practice (when we deploy); that is to use our model to accurately predict the outcome(s) of interest.
We do this because the true test of a model is not how well it can predict the outcome when it knows it (which is what it does with the training subset). Rather how well can it predict the outcome when it doesn’t know what the outcome is.
So how good is my model (really) ?
Until now we have only considered how accurate the model is by considering what percentage of the time it gets the prediction right e.g. predict churners. In practice of course this is only part of the evaluation process. We may find, for example, that our model is good at finding low value fraud (of which there is likely to be more and hence our overall percentage prediction) is higher … but that the more valuable transactions which hurt us more are missed. One way to address this could be to focus on (e.g. create a subset which focuses on the valuable minority while still being sufficiently representative to be deployable). Either way our evaluation of candidate models, and hence the models we might continue to develop and refine, should be led by model evaluations which include all the factors that we really care about. These are often around the cost/benefit of the actions that the model would have us take in the field to act on its predictions. This is where more involved simulations enable us to make more meaningful assessments of the future impact of a model.
Next we will take a real life example to better illustrate how this step can work in practice…
How to do Predictive Analytics – Part 4
This post originally appeared on Applied Insights’ blog. Foviance acquired Applied Insights in November 2008, with Neil Mason joining us as Director of Analytical Consulting. As part of this acquisition, we’ve incorporated Applied Insights’ blog into our own.
Step 3 – Data Preparation
Anyone who has ever analysed data knows what a nuisance it can be. Whenever we want to analyse it in a new style we often have to manipulate in some way before we can do so. The more “raw” the data, or the more fundamentally different the analysis, the more work we typically have to do to get into the shape we need for the analysis we want to perform.
As I mentioned in an earlier blog; if the primary data source is a data warehouse which contains well structured, rigorously cleaned and de-duped data, then this is usually the best starting point. But it is only that. The shape of the data tables in the warehouse will inevitably have been defined with a certain type of analysis in mind; most often to produce the standard business intelligence style of reports. You might get lucky and find you can use that data as-is for the kinds of predictive analysis you have in mind. The chances are that you won’t, and that you will have to re-structure the data in preparation for that analysis.
Furthermore it may well contain aggregated data, perhaps an OLAP structure of some sorts, which may allow you to produce time series forecasts but which will most likely contain data which is too summarised for most other kinds of predictive analysis. If this is the case then you’ll probably need to go back and locate the sources of the summary data. That might not be a trivial exercise.
How did we get here?
In the previous steps, discussed in other blog entries, we effectively designed the analyses which we intend to perform at the next step; Data Modelling. In Data Understanding we learnt all about the existing data structures, formats and sources and we started to look for patterns in those sources which are pertinent to the analytical objectives we defined at the start of the process. The truth is that, to perform the exploration, we would have had to prepare the data to some extent. But this is the point where we get serious and apply the necessary data management steps to get the data into the shape(s) required for the main task; predictive modelling.
At the top level this means we end up doing one or more of the following:
- Cleaning data
This may not be necessary depending on how “clean” the original source is (though it is not unusual to find data problems when we start to analyse it in an unfamiliar way). Our previous exploration should have revealed any errors, or inconsistencies, which need to be corrected, or excluded. - Merging data from multiple sources
If you are lucky the data will be in a single data file, or a single table in a database. If you are unlucky it will be in a variety of disparate sources with different formats in various locations - Shaping it for the analysis
Often the most time consuming element. A classic example is where we have data with a sequence to it; typical if we are looking to predict the likelihood of a an event given a set of previous events. The starting point is typically data in a database which often contains all event transactions. In order to model it in a way which mimics how we will look to apply (deploy) the model we have to define an appropriate point in history as the baseline, e.g. if we are interested to know what will happen after March 2007 we might use March 2006 as that anchor point. We then have to restructure the incoming data to derive all the interesting predictors e.g. transaction frequency, transaction value in previous months, years, etc. from March 2006 backwards. We also need to have a separate data partition which contains the “what happened next” data for a period after March 2006 that corresponds to the period we want to predict into in 2007; so if we are interested to see which customers are likely to churn in April 2007, then April 2006 is likely to be the best month to look at it 2006. NB. Modelling and Evaluation (see later) will help test that hypothesis. - Deriving new data elements
Typically new fields(variables). In our exploration, for example, we may have found that there appears to be a strong relationship between the rate at which a customer buys products and the likelihood that they will churn. In many cases that rate will not exist as a separate measure in the current data, so we create it in this step. - Describing it
Labelling, formatting and generally documenting the data in a way which helps the analyst, or other viewers of the data, to understand its meaning.
The outcome of the above is a set of tables, or data files, which are in the shape we believe we need for the modelling effort we have in mind.
You [almost always] never get it right first time
We’ve mentioned it before but it is worth re-stating that much of the CRISP process is iterative. Quite often we will get into the modelling step, for example, and discover a potential relationship that looks interesting but which we have to go back to the preparation process to derive. Frequently, because we are often building complex data handling processes from scratch, we just make mistakes which need to be corrected.
With large datasets the preparation time can be significant; It can take hours sometimes days, so mistakes and re-runs can be costly. Hence wherever possible it is a good idea to test the process using data subsets, ideally random, or at least representative, samples. Samples can also be used to boost productivity when we get into the analysis – more on that next time.
An example
Data collected in the web channel is a great illustration of this point. We work with a lot of this kind of data typically for web sites with large numbers of visitors; usually millions per week. These sites inevitably have a web analytics tool which they use to analyse key metrics of site performance. Most often we are interested to apply predictive and/or segmentation methods to the site data. This typically involves:
- Extracting behavioural data from the data warehouse (underlying the web analytics tool) or via a data feed that the analytics vendor provides. More often than not we extract this data to a number of text format files.
- For our Customer Journey Framework we usually have an additional data source in the form of on-line surveys. Depending on the analytics tool that the client is using we have developed a number of ways of linking the data that the visitor provides as a respondent in the survey to the behavioural data which maps that visitors journey through the site.
- The data we end up with can be at various levels but more often than not it is at the individual page or individual click level (remember these sites have millions of visitors so the number of records gets multiplied up). We take this data and aggregate it over a period of time to end up with tables for analysis which are at the visit and/or visitor level. Each of the resulting records will contain fields of interest; e.g. site content viewed, visit intentions and conversion goals which we will use for analysis.
For a typical site processing a weeks worth of data into the shape needed for analysis can take 4-6 hours.
Which tools?
As is often the case the choice of tool for data management comes down to those that the analyst/data is familiar with. Database tools are all about this type of work and often the best approach is to aim to construct data mining tables inside a relational database. This can be achieved using a combination of SQL, ETL tools and other database utilities.
Generally speaking; the more sophisticated the predictive tool itself the greater the data management capabilities which are built in. So SPSS, SPSS Clementine, SAS and SAS Enterprise Miner offer a broad range of data handling procedures.
So much for progress
Even though we have more and better tools, and faster hardware, with which to manipulate data these days this is offset by the increasing volume of data, complexity of structures and number of sources. Hence the old adage that data management consumes more of a data analysis effort than the analysis itself typically holds as much today as it ever did. But it is a necessary pain to get us to the point where we can get to the next step which is at the core of the predictive process; Data Modelling.
How to do Predictive Analytics – Part 2
This post originally appeared on Applied Insights’ blog. Foviance acquired Applied Insights in November 2008, with Neil Mason joining us as Director of Analytical Consulting. As part of this acquisition, we’ve incorporated Applied Insights’ blog into our own.
Step 1 – Business Understanding
It sounds obvious but – in data analysis as in life – it usually helps to really work out what you are trying to do before you try and do it. Sometimes even those of who us who have been doing it for a while forget to do this with as much diligence as we should.
Having had preliminary discussions, and having agreed there is some potential to apply analytics to predict future outcomes, this is the phase where we decide whether there is a predictive effort worth undertaking. At the end of it we might conclude that it is either too risky or that the costs will probably outweigh the benefits. In our experience this rarely happens but the actual objectives can change significantly through the understanding phase.
From my perspective there are 4 crucial aspects to a successful “understanding” phase (and hence a succesful analytical task, project or programme):
1. The importance of domain (subject matter) expertise
Domain expertise is the essential ingredient that the eventual success of the whole effort relies on.
We can think of analytical expertise and domain expertise as the flip sides of the analytical coin. In some instances one person may fulfil both roles (e.g. a product manager may produce demand forecasts for his/her own brand) but in more complex situations different stakeholders typically contribute these elements.
The level of each component required clearly depends on the context. For example an external consultant who is an analytical expert may be engaged into a business area that s/he knows quite a lot about – let’s say it is consumer marketing in the retail sector – even then there are probably intricacies in the specific business context which the consultant needs to learn about from those more involved in the specific business. Strategy, current priorities and specific knowledge on this company’s customers, products, services and resources are typical examples.
Hence the business understanding phase is typically the point at which the domain expert(s) and analytical expert(s) engage the most to frame the project through the rest of its lifecycle.
The partnership between the business and its analysts also underlines a key point about how predictive analytics succeeds, or fails. There is a common, and understandable, misconception that predictive analytics (and its compatriot data mining?) are comprised of, almost magical, techniques that can be whimsically applied to data (by someone who knows how to apply them) and gems of hitherto unknown insight will flow from through some special software. The truth is rather more mundane. It is almost always through the definition of more specific business-related goals, and the translation of those goals into achievable modelling activities, that true benefit can be derived.
From the analyst’s perspective the worst setup is usually the business challenge “here’s some data, tell me something interesting?”. That’s not to say that s/he couldn’t get to something that way but if the question is a more specific, and relevant, one e.g. “can we identify which, if any, of these customer transactions are likely to be fraudulent?” then the chances of success are far greater.
We don’t talk anymore
Moreover, our experience teaches us that the more interaction there is between the business expertise and the analytical expertise throughout the project the more likely it is that a project will succeed. If I was to highlight the main reason why projects fail to live up to expectations it is probably because of the lack of communication, and hence understanding, between the two sides.
In a consulting context this is often harder to achieve because of traditional client/supplier relationships and the time constraints on the client side. However this is something that needs to be addressed in the business understanding phase. From the consulting perspective it is better to agree not to proceed with a project if one of the most critical resources; domain expertise from the business, can’t be supplied in sufficient measures.
2. Evaluate all relevant business elements
CRISP-DM has a good checklist for this but it is effectively an audit of all the people, systems, tools, financial and other operational factors which may impact the analytical effort. Clearly the level of investigation that needs to take place is dependent on the perceived size and complexity of the programme/project.
In our spreadsheet example in my last blog, though the modelling may only take a short time, if successful we might have to figure out how we deploy additional marketing effort to increase sales (if the model tells us that is what we need to do).
3. What is the risk, the cost, and the potential benefit?
Probably the key criteria when assessing risk is the familiar one of precedence. Has anyone involved done anything like this before and did they succeed? Is there any other material in the public domain that indicates if/how something similar has been achieved before?
Modelling customers who are likely to churn is one of the most common applications in predictive analytics. Mobile service companies and other providers in countries with de-regulated Telecommunications sectors are particularly active in the development and application of predictive models to achieve this objective and there is considerable evidence of success.
So if I am a mobile operator in a similar market I am more likely to conclude that there is less risk in any similar project that I was to undertake.
Sometimes the applications are slight variations on common themes. Also in telco we recently worked with one of the de-regulated suppliers to identify when it was best to try to reactivate churned customers (as opposed to the old favourite “how can we tell whether someone is about to churn in time to stop them?). Though this was a slightly different objective it used similar analytical methods to the original. Going in we couldn’t say for certain that it would work but we had a strong suspicion – partly driven by some recent bespoke customer research, and partly because we knew that the original approach works – that it probably would.
There’s a first time for everything
Just because no-one appears to have used predictive approaches for a particular application that doesn’t necessarily mean that they can’t be applied in that way. The greatest rewards can often come from more innovative applications (as long as we are not blind to the potential pitfalls).
In the past we’ve worked with a leading global CPG company to model the best combinations of chemical ingredients as part of their product development research. We are currently working with a European partner to predict incidents of illicit, trans-national, trade. To our knowledge no-one had applied predictive analytic methodology in this way before but we are adapting techniques we’ve applied in similar areas in other domains and working closely with subject matter experts in, hitherto, unfamiliar domains.
Show me the money
This is also the phase where we need to figure out what the potential costs and benefits look like. These are usually financial and it ought to be possible at this stage to estimate what the likely levels should be. The consensual success criteria, discussed below, are usually the scenarios we use to evaluate the financial, or other, improvements that we are aiming for.
One of the great things about this whole area is that it is explicitly about modelling and predicting metrics, like ROI, with some scientific rigour. Once we get into the data we should getter a better idea of what is achievable and be able to more accurately evaluate these numbers.
4. So what do I do with it?
We’ll come on to the final phase, deployment, in more detail in a later blog but in the first phase it is important to have a clear idea of how, in the end, we expect the predictive work to be applied to achieve the original goals defined in this business understanding phase.
As this can often involve a degree of business process change everyone involved, needs to understand what needs to happen and be comfortable that it can happen. For example some of our deployments have involved daily data processing steps e.g. loading data to produce demand forecasts. Often these are automated processes requiring minimal, but regular, manual inputs.
In some instances the change can be more involved. For example we’ve often found it necessary to help train/recruit staff into model development roles. In those places where we’ve seen predictive analytics succeed the most then there tends to be resource dedicated to generating, distributing and maintaining models throughout the enterprise. More on that later.
Agreeing clear, and achievable, success criteria at this point is an essential means to clarify the ultimate deliverables using KPIs for the output goals. Some example criteria are:
- Reduce customer churn by 20% in 6 months
- Increase the average customer satisfaction score to 8.5
- Reduce click fraud by 50%
For this, and the other phases, CRISP-DM formalises all of the above and details specific outputs and deliverables. From our perspective though the important thing is that everyone involved clearly sees and understands what is going to happen. The setting of success criteria is very helpful here but also that the means to that end is understood.
As I mentioned in my last post the size of the task can vary and hence the amount of time you spend in business understanding can flex accordingly. Generally speaking the bigger the project the more critical this step is to really evaluate whether it is worth proceeding and to clearly establish the goals.
Having nailed all of the above (you never have completely but let’s say we’ve done our best) we will have decided that we probably have enough of the right kind of data to meet out goals. The next step is to delve deeper into that data to test that assumption further.
How to do Predictive Analytics – Part 1
This post originally appeared on Applied Insights’ blog. Foviance acquired Applied Insights in November 2008, with Neil Mason joining us as Director of Analytical Consulting. As part of this acquisition, we’ve incorporated Applied Insights’ blog into our own.
Having finished the last blog by arguing that more people could be benefiting from Predictive Analytics this begs the question of where to start if you are not doing it already.
Is it just a case of buying a predictive analytics software tool and ‘plugging it in’ – Probably not. Nor do you necessarily need to hire expensive contractors/consultants, unless the need justifies it.
As you might expect part of the predictive analytic process is to decide what the potential cost/benefit of the activity would be, and indeed whether predictive analytics as an approach has any chance of meeting the business/research objectives you have in mind. The good news is that there are some pointers to help make those decisions.
A process template for undertaking a predictive analytical project
As I mentioned last time there is a significant overlap between the types of analytical activity described as ‘Data Mining’ and those which are also termed ‘Predictive Analytics’. For that reason we feel that the process involved in executing the former (and for which there are existing process blueprints) are currently the best templates for undertaking the latter.
There are probably 2 leading process models at the time of writing.
1. SAS have their own called SEMMA.
2. A cross-industry forum including DaimlerChrysler, SPSS, NCR Teradata and others have developed CRISP-DM.
Generally speaking these models cover the same ground, and or not unlike many consulting project engagement models. We tend to use CRISP-DM as the basis for our work for 2 main reasons:
- Our sense is that more collaborative thought has gone into producing a more detailed template.
- It is somewhat broader in the sense that it covers the business objectives and the ultimate application (deployment) of the outcomes (e.g. models, scores, etc.) in more detail.
A visualisation of the 6 steps in the CRISP model can be seen here.
Simply put they are:
1. Business understanding
Starting with a business goal (or goals) – e.g. reduce the rate at which my customers are defecting -this is the crucial step in which we take those objectives and begin to evaluate the business context before embarking (or in some cases deciding not to embark) on the analytical process.
2. Data understanding
The second understanding step is to audit and investigate the data in the various data sources which can potentially provide the grist for the analysis. This covers both a top level analysis of the metadata and a deeper, exploratory, analysis of the data.
We typically see the two understanding steps as part of the same phase. Once complete you should be in a position to evaluate what is likely to be achievable from a modelling perspective. Most often we find that there is enough potential to continue with the modelling, though in many cases the project may turn out to be somewhat different to the original expectations. In a small minority of cases there may not be that potential, or we may feel that it is too costly/risky to undertake the prospective analysis.
3. Data preparation
This is the necessary, but arguably the least interesting, step – unless you like this kind of thing!. Sometimes described as the ETL (Extract, Transform and Load) stage this is where we beat the data into shape by importing it into an appropriate format for the target analytical tool(s) chosen in the understanding phase. In our experience this is the step that takes a bigger chunk of the project than one might expect.
4. Modelling
This, and the next step which is strongly linked, is the crux of the process. This is where we apply one or more – usually several – appropriate modelling techniques to the data. We shall talk more about user interfaces to models and software tools later but model selection ‘usually’ requires a level of expertise to identify appropriate modelling techniques which fit the shape of the data (e.g. some modelling algorithms require input data to be normally distributed).
5. Evaluation
Quite simply how well does the model perform. This may be as straightforward as looking at the percentage accuracy of the model predictions against an unseen test (‘holdout’) sample. It could be about evaluating how the model performs more sophisticated scenarios related to profitability, or the risk of investigating too many non-fraudulent credit card transactions and annoying too many loyal customers.
6. Deployment
The whole point of the exercise is to apply the results of the analytical process in a way which creates benefit going forward. Broadly speaking this takes 2 forms.
- It could be about simply making decisions based on the insight generated, e.g. deciding to open a new retail outlet in a location which scores highly from the perspective of market potential.
- Or it could be about integrating the results data (e.g. propensity scores), or even the model, in a way which can automate operational actions. For example we might embed an on-line advertising click fraud detection model in out web analytics process to send/report alerts when potentially malevolent transactions are generated. Or we might simple generate a list of new customers who were scored highly by a model which predicts lifetime value, but who we believe need to be engaged early in their lifecycle to meet that potential. Such a list can generate call centre actions or marketing campaigns (and the model may also indicate which is the more appropriate).
- Small scale: I have a spreadsheet with the last 5 years sales in it I wonder if I can predict sales for the next 3 months? In a sense this puts the cart before the horse; having data sparks a potential business objective which we might not have thought about. The more explicit objective could be to meet sales targets. The whole exercise (up to deployment at least) could take less than a day.
- To the larger scale: Can I identify new customers who have the potential to be the most profitable in the future? Against a business objective of growing customer profitability. This example starts with the business objective in the regular way but may require more convoluted merging of data from various databases, potentially related to customer acquisition from different channels, products, divisions, countries, etc. A project of this kind could take weeks, and sometimes months, to complete.
One important note is that – as you can see from the CRISP diagram – this isn’t necessarily a linear process. For example we might find gaps in the data in the preparation step that lead us to re-evaluate our understanding/objectives, or alternatively we find that the data we are modelling may have issues which can be resolved through new transformations (e.g. imputing missing values) as part of a new preparation step.
It may look quite heavy, but it doesn’t have to be. In our experience the process model can range from:
What happens in practice?
I’ll use the next few blogs to walk through the process model and give some pointers to what we’ve found to be important on engagements. I’ll try and be candid enough to identify areas where we’ve had success (or otherwise), where things can go wrong, and where we think there are limitations in the methodology.
So what is Predictive Analytics?
This post originally appeared on Applied Insights’ blog. Foviance acquired Applied Insights in November 2008, with Neil Mason joining us as Director of Analytical Consulting. As part of this acquisition, we’ve incorporated Applied Insights’ blog into our own.
Over the past few years we’ve all been hearing more and more about ‘Predictive Analytics’ (PA for short). It is one of those terms – like Business Intelligence (BI), Customer Relationship Management (CRM), Enterprise Resource Planning, (ERP) etc. – that once coined, captures the essence of an area of business/research activity and gets a life of its own – particularly as software and services vendors use it as a means to more easily describe and, of course, sell their products and services.
A simple definition of predictive analytics
A simple definition of predictive analytics is that it’s an activity which allows us to quantify future events or actions. This quantification could be as straightforward as generating a list of customers who are likely, at a point in time, to behave in a certain way, e.g. to churn, to register, to buy or to respond to a particular mail, etc. Typically this list will be accompanied by a score which gives us the probability (or propensity) that an event will occur.
Alternatively the technique may give us a value, or set of values, e.g. a set of sales forecasts for a given product line in the coming week; furthermore the forecasts are often presented with confidence values, e.g. we might predict that we will sell 50 widgets on Monday but we can say, with 95% confidence, that we will sell between 45 and 55.
So what’s new?
One of the main characteristics of PA is that many of the tools, techniques and applications which it comprises are not actually particularly new. Credit scoring is one of the most well-known applications of PA. and credit scoring has been around for over 50 years.
Wikipedia details many of the other uses and techniques which can be described as ‘predictive’ though the current list there is by no means exhaustive. You will probably recognise a lot of them. In fact many of them come from the world of Data Mining and there is some significant overlap between Predictive Analytics and Data Mining, but there are also many differences; more on that later.
So is ‘predictive analytics’ just a new bottle for a lot of old wine? One of the new things is that there is an abundance of relatively new technology which can:
- Take PA to a wider audience by making the often complex algorithms more usable for less statistical/technical users.
- Provide access to a broader range of techniques through smarter user interfaces which map more closely to the analytical process in such a way that modellers (analysts, statisticians, data miners, demand forecasters, etc.) can more productively access data , test and develop models and ultimately deploy the best ones.
- Allow the results of PA (e.g. customer lists, scores, models, etc.) to be used more easily in decision making processes going forward.
For example, PA is being used to enable customer services representatives in a call centre to prevent customers churning with appropriate offers. It’s also be used in web site recommendation engines which can serve up relevant content based on what is understood of a visitor’s needs, preferences and previous browsing/buying behaviour.
This last point could mean generating models in a format (such as PMML) which can be more easily plugged-in to an operational process or integrating PA into other tools like CRM platforms.
There are also many new techniques and new applications for those techniques. The theories around Naïve Bayes and Robust Regression, for example, may have been around for a while but it is only recently that they have been available in an accessible commercial format. Techniques to automate the search for the best fitting time-series model are also quite contemporary. Applications such as SPAM detection and multivariate testing to optimise on-line marketing campaigns and page content are arguably the latest in a long line. Even if your potential application doesn’t seem to feature in any list of previous ones it could be worth exploring the potential that PA offers. There is a first time for everything!
So what can predictive analytics do for me?
PA is more commonly applied within organisations to address specific issues like preventing customer attrition or to identify segments of customers who will be more responsive to specific campaigns. One of its advantages is that it is usually possible to demonstrate potential ROI through the model, and if the model is a good one, the actual ROI when the model is deployed should not be too far from the prediction.
Increasingly though these techniques are being used more strategically to inform a range of organisational decisions and actions. Effective CRM programmes, for example, often use PA in a number of ways to anticipate customer needs and behaviours across channels and at different points in the customer lifetime.
So more sophisticated applications will not only identify good prospects for acquisition they will help define a series of interactions as the prospect develops into a loyal customer which enhance the customer experience and ultimately drive mutual benefit.
Getting more into predictive analytics
Despite the publicity I feel that applications of predictive analytics, particularly in business, are only scratching the surface today. There are a number of reasons for that; among these is the age old one that there is a gap between the available tools/techniques and the appropriate circumstances for business adoption. There is also a dearth of resource to help bridge that gap!
In future blogs I’ll explore predictive analytics in detail; looking more closely at the tools, techniques, applications and vendors. I’ll give a view on the process of applying PA – particularly aimed at those who haven’t started yet. I’ll also look beyond the hype into cases where it has been used to achieve significant results.
I’ll even discuss how PA relates to BI, CRM and ERP. Though we shall try and do that in a more understandable way than I did in that last sentence!
The Analyst’s Toolbox: Decision Trees and other classification techniques
This article, written by Neil Mason, was originally published on Clickz.com and is republished here with permission.
In my last article I had a quick look at some other tools for the analyst toolkit other than their web analytics system. These included Business Intelligence or OLAP tools, visualisation tools, statistical analysis and data mining tools. This week I want to take a deeper look at the use (and possible abuse) of statistical analysis and data mining techniques.
Statistical analysis and data mining covers a wide variety of approaches, methodologies and techniques that might be useful for the web analyst. They can be broadly be classified as follows:
- Statistical analysis
- Classification techniques
- Clustering and segmentation methodologies
- Forecasting
- Text analysis
It’s probably best to start with a note of caution. There’s a saying “If you torture the data long enough, it will tell you anything you want it to”. These kinds of data analysis techniques can be very powerful and they can be used to uncover nuggets of gold in your data. They also need to be used carefully. The analyst needs to ensure that the results are robust, reliable and above all make sense. Data mining is as much an art as it is a science.
Simple statistical analysis techniques such as frequencies and histograms can reveal interesting patterns in your data. I’ve written before about the dangers of using averages metrics such as “average pages per visits” as they hide interesting differences in behaviour. Worse than that, they can actually be misleading.
Often in the work we do, we will spend a lot of time initially carrying out exploratory analysis looking at the patterns and distributions in the data. It’s time well spent. It gives you a feel for what is going on below the topline metrics and also helps later when you begin to look at the results of other analytical techniques. As a marketing analyst you need to have a sense of how the data is made up, how the topline metrics are constructed and where they come from. For example, you may find that there are some extreme values or “outliers” that might affect your results and so need to be dealt with in some way or another.
With statistical analysis you may want to compare different groups of visitors or customers. For example, looking to see whether the repeat order rate is higher amongst some groups of customers than others. You can apply statistical tests to see whether any differences are real significant differences or whether they just might be because of the variability in the data. Significant difference testing can be important in experiments such as A/B tests to ensure that “A” is really better or worse than “B” before making any changes to the site.
There are many different types of “classification” techniques including regression analysis, often used in credit scoring, as well as Articicial Intelligence approaches including neural networks. The class of techniques that I want to take a look at today is the use of “decision trees“. There are a number of different algorithms in this type of technique including CHAID, CART and QUEST. These algorithms essentially do the same thing in different ways and that is to assign the data records (such as visitors or customers) into groups of interest based upon the other variables that you have on the record.
For example, you may have records on customers that splits them into two groups: “single order customers” and “repeat customers”. You may then also have a whole string of other data on those customers and you are interested in understanding what are the key characteristics that distinguish between someone who orders once and someone who goes on to order again. Decision Tree methods will look at all the other variables and determine which one is the most important factor in determining the difference between a single order shopper and a repeat order shopper. It then repeats the process again and gain until it has determined what all the significant factors are in order of priority.
The great thing about decision trees is that the output is very visual and relatively easy to understand. They can get a bit big and cumbersome though especially if you are dealing with a lot of variables. Decision Tree techniques have been used for years in direct marketing work to determine which type of people are most likely to respond to mailings, so that companies can cut down on mailing costs.
In online marketing, mailing costs isn’t such as big issue as it is in the offline world but we have used techniques like decision tress in other areas to understand what the factors are that influence visitors to do something or not. In the example above of single order customers vs repeat order customers we did a piece of work where we looked at many potential factors that included:
- the size or the first order
- the number of visits to the website after the first order
- the product category of the first order
- the product categories browsed after the first order
- whether they were opted in to the email newsletter
- how many newsletters they had received
- the timing of the newsletters after the first order
We found that the most important factor in determining whether someone went on to order again after their first order (out of all the ones we examined) was that someone had opted into the email newsletter and had received a newsletter within 5 days of that first order. Vital input into a retention marketing programme.
Decision Tress techniques are also useful for profiling and understanding different segments of visitors or customers. Segmentation techniques are what I will be looking at in the next part of this series.
Till then…