Day 6: The Data Analysts Toolkit: Why are Excel and R useful together, and how do we connect them?



https://www.flickr.com/photos/38953955@N07/14764024879/player/
Why is analytics interesting? Well, companies are starting to view it as profitable. For example, McKinsey showed analytics was worth 100Bn today, and estimated to be over 320Bn by 2020.
When I speak to customers, this is the ‘end goal’ – they want to use their data in order to analyse and predict what their customers are saying to them. However, it seems that folks can be a bit vague on what predictive modelling actually is.



I think that this is why Power BI and Excel are a good mix together. It makes concepts like Predictive Modelling accessible, after a bit of a learning curve. Excel is accessible and user-friendly, and we can enhance our stats delivery using R as well as Excel.

One area of interest is Predictive Modelling. This is the process of using a statistical or model to predict the value of a target variable. What does this actually mean?  Predictive modelling is where we work to the predict values in new data, rather than trying to explain an existing data set. To do this, we work with variables. By their nature, these vary; if they didn’t, they would be called a constant.

One pioneer was Francis Galton, who was a bit of an Indiana Jones in his day.  Although he wrote in the 19th century, his work is considered good and clear enough to read today. Therefore, this research has a long lineage, although it seems to be a new thing. We will start with the simplest: linear regression.

Linear regression compares two variables x and y to answer the question, “How does y change with x?” For predictive modelling, we start out with what are known as ‘predictor variables’; in terms of this question, this would be x. The result is called the target variable. In this question, this would be y. Why would we do this?

  • Machine Learning
  • Statistics
  • Programming with Software
  • Programming with Data 
  • Fun!

Why would businesses work with it at all?

  • to discover new knowledge and patterns in the data
  • to improve business results 
  • to deliver better customised services

If we have only one predictor variable and the response and the predictor variable have a linear relationship, the data can be analyzed with a simple linear model. When there is more than one predictor variable, we would use multiple regression. In this case, our question would be: , “How does y change with multiple x?” 

In fitting statistical models in which some variables are used to predict others, we want to find is that the x and y variables do not vary independently of each other, but that they tend to vary together. We hope to find that y is varying as a straight-line function of x.

If we were to visualise the data, we would hope to find a pleasing line chart which shows y and x  relating to each other in a straight line, with a minimal amount of ‘noise’ in the chart. Visualising the data means that the relationship is very clear; analysing the data means that the data itself is robust and it has been checked.

I think that’s why, in practice, Power BI, Excel and R work well together. R has got some great visualisations, but people are very comfortable with Excel for visualisations. All that loading packages stuff you have to do in R… it doesn’t work for everyone. So we use R and Excel, at a high level, as follows:

https://www.flickr.com/photos/38953955@N07/14764024879/player/
  • We cleanse and prepare data with Excel or Power Query
  • We use RODBC to load data into R
  • We analyse and verify the data in R
  • We build models in R
  • We load the data back into Excel using RODBC
  • We visualise the data for results


Excel is, after all, one of the world’s most successful software applications ever, with reputedly over one billion users. Using them both together means that you get the best of both words: R for analysis and model building: Excel is the ‘default’ for munging data around, and visualising it. I’m sure that one of the most popular buttons on software such as Tableau, QlikView et al is the ‘Export to Excel’ or ‘Export to CSV’ functionality. I’d be interested to know in what people think about that!

Building linear regression models in R is very simple; in our next session, we will look at how to do that, and then how to visualise it in Excel. Doing all this is easier than you think, and I will show you how.

Data Mining: good data, then do the maths

Data mining can be loosely described as ‘searching for patterns in data’, but it is important to ensure that the data is properly in place before starting. It’s easy to get sidetracked with the selection of Data Mining algorithm, for example; that’s obviously a core piece, but if the data isn’t in place, then we can’t be sure that the results will be correct. This emphasis on data collection and integrity hasn’t always been around, however. It was hard to think of a data mining example where the importance of data made interesting reading! So I looked instead to the history of astronomy; Tycho Brahe (1546 – 1601) laid the foundations for today’s astronomy, by emphasising the rigorous and clean collection of data regarding the planets and the stars, which was a real innovation.

Tycho believed that astronomy could not be pursued by non-rigorous collations of astronomical data. Instead, Tycho believed that astronomy could only be understood through systematically collecting data. This also meant including redundant data, which Tycho was prepared to put the work into completing; he actually conducted this study for almost twenty years, and, amazingly, without a refracting telescope! This is an incredible achievement, since this feat occurred prior to the development of the refracting telescope (Keplerian Telescope), which was created by Kepler in 1611.

Significant to us today, however, is that Tycho produced the most accurate and systematic astronomical data of his time; he successfully managed to note the orbits of the planets to a very close degree. Tycho systematically collected the triangulated locations of the planets and stars throughout the course of the year, believing that the factual, observed data was the only way forward. Tycho produced hundreds and hundreds of statements about the location of each planet over the course of the year, for example ‘On the 15th March 1572, at 2.04am the planet Mars was 32’38” above the horizon, and 12’30 west of the pole star’.

Many people sought after Tycho’s data, and Tycho was much-sought after as a teacher, due to his data. Eventually, Johannes Kepler became Tycho’s apprentice. Kepler took Tycho’s data, and used it to generate his Laws of Interplanetary Motion. Briefly, these laws showed that the planets moved in elliptical orbits, rather than in circular orbits. Newton once described himself as ‘standing on the shoulders of giants’, and rightly so, since he used Kepler’s work to inform his work on gravity. Building on Tycho Brahe’s data, Isaac Newton (1642–1727) later deduced the fundamental mechanisms underlying the movements of planets. Newton’s Three Laws of Motion (uniform motion, force=(mass * acceleration), action-reaction), along with his Law of Universal Gravitation, therefore come directly from Tycho’s original observations.

Until mid-18th Century, the known planets were Mercury, Venus; Earth (obviously), Mars, Jupiter and Saturn. A detailed investigation of the orbit of Saturn showed that it was not an ellipse, but there was a slight deformation. This could be explained by another planet, whose gravitational pull was affecting Saturn’s ellipse. In 1740, Newton’s mathematics were used to create predictions surrounding the existence of another planet were published, and in 1781, Uranus was found, within 2 space minutes of the predicted location. Uranus’ orbit was also shown to have a deformed elliptical orbit, and again, another planet was posited. In 1821, based on predictions made by Newton’s mathematics, Neptune was discovered in 1846.

Thus, Newton’s theory, together with the appropriate accurate data in place, had deduced the existence and precise orbits of two previously-undiscovered planets. The theory didn’t fit all the facts, however; Mercury’s orbit also is not totally elliptical. Newton’s theory required the presence of another planet, located between Mercury and the Sun, in order to affect Mercury’s orbit enough to pull it slightly away from its ellipse. As an aside, this hypothetical planet was proposed to be called Vulcan, which Star Trek fans will know as the home planet of Spock. However, Mercury’s activity was explained by Einstein’s Theory of Relativity.

Central to this story, however, is Tycho’s stringent collection of data, and emphasis on rigorous data integrity. It is important to note that Tycho was an innovator in terms of his determination to collect careful observations of data; until this point, no-one had done this. Without his work and emphasis on clean data, the history of achievements in astronomy might have taken longer to achieve.

To summarise, it is not just all about finding patterns; the data has to be right in the first place. There has to be enough data for trial and test, along with a reduction in missing data, and even repeatable observations where possible. This applies not just to Data Mining, but other spheres too.

Data Mining – useful or not?

What distinguishes Data Mining from other methods of exploring data, and what is its usefulness? Critics might say that if you torture the data enough, it will eventually confess! Computers contain lots of data, but people need help to turn this data into intelligence. As Frawley (1992) nicely puts it, “Computers have promised us a fountain of wisdom but delivered a flood of data.” Data Mining can help us to move from data keepers to intelligence gatherers. There are lots of definitions out there, but I like the Gartner one best: “A broad category of applications and technologies for gathering, storing, analysing, sharing and providing access to data to help enterprise users make better business decisions.” Data mining is particularly useful when it can be applied to customer insight, and this blog will aim to apply DM to customer insight.

Customer insight is a cornerstone of the activity of any business. It is at the core of ensuring that the business continues to devise products and services that will entice new customers in addition to turning satisfied customers into long-term, profitable, repeat customers. Customer insight has different perspectives:

Demographics

understanding the classifications of customers characteristics who purchase from the organisation. This includes location, age, sex and so on.

Behaviour classification

understanding the classifications of customer behaviour and lifecycle in contact with the business. This can involve an analysis of their spend, and frequency of touchpoints with the business.

Propensity to Churn

analysing of customers who have previously churned to discover patterns in their behaviour. In turn, these patterns can be applied to existing customers to pinpoint those who might churn early so that strategies can be put in place to avoid this.

Propensity to Buy

analysing the characteristics of customers who purchase certain products, and searching for patterns amongst a set of customers in order to predict their probability of purchasing these products as well. These propensity models are useful for understanding which customers are most likely to purchase a given set of products. This model can assist in decision making and in focusing marketing efforts.

As a first step to customer insight, analytical tools can summarise and aggregate historical information about customers. One particular technology which is good for summarising and aggregating data is called OLAP (On Line Analytical Processing). This could provide analysts with information such as the top ten performing products for a given month, or the top customers who purchased most of a specific type of product.

Historical analytics can help to support the marketing process, which can also be augmented by predictive analytics, alternatively known as data mining, which can help to identify patterns in customer behavior. The customer behavior patterns found can be used to make more informed, quicker decisions about targeting customers. Thus, a customer data warehouse can progress from simply becoming a data store to becoming an intelligence tool which is used to inform decisions and direct marketing spend.

Historical and predictive analytics are not mutually exclusive, but instead work together to inform the marketing process. For example, in a marketing campaign scenario, it could be possible to evaluate the success and impact of a marketing campaign by looking at historical customer activity, and then looking at the end-to-end process of CRM analytics by using data mining to search for related patterns in the customer behaviour.

Data Mining (DM) offers three main activities: data exploration, pattern discovery and predictions. The key differentiator is that data mining performs predictions, by using predictive techniques to find patterns in data. Microsoft offers Data Mining at no extra cost as part of SQL Server 2005 and 2008, which is geared towards the average Excel user. It is proposed here that companies who already have SQL Server software could consider these tools for the purposes of starting to conduct predictive analytics, or data mining, on their customer data – a lot of companies already have SQL Server 2005/ 2008 already, so why not try the data mining functionality that comes with the software? Businesses already have a great deal of operational data about their customers, which could be leveraged in the predictive analytics process. In particular, the existing data could be analysed to discover patterns in customer behaviour using data mining. It can offer a different way of looking at the data; it may reveal patterns in the data that the consumer had not previously seen, thereby triggering new avenues of opportunity.

Data mining is useful since it can help to investigate the loss of customers; since it can cost a lot to get new customers, an area of interest is to retain existing customers by identifying which customers are most likely to ‘churn’, or leave service. If the business can understand the causes of customer churn, then they can start to try and prevent it from happening in the first place. The issue is that a customer who cancels a service has actually started a relationship with the competitor before they make the cancellation call. It is at this point that the customer has been lost already.

Data mining can help to investigate the propensity of customers to churn so that they can be distinguished and addressed. Reduction of customer churn has a number of business benefits: the lifetime value of the customer might be increased, and as an adjunct, marketing spend can be targeted more effectively. Further, this could help to reduce new customer churn, where the customer acquisition cost has not been recouped at the point where the customer leaves the service. In some cases, reducing customer churn could even promote growth in areas that are considered to be already mature.

Reduction in customer churn is particularly important for companies in the services industries. There is often a distinction made between voluntary and involuntary churn. Voluntary, or commercial, churn is the situation where the customer has decided to leave and take up a competitor’s service; involuntary churn is where the customer has had no choice to leave the company due to factors such as relocation or invalidity. The ‘Churn’ Rate can be expressed as a percentage, which can obtained by conducting the following formula:

  • Churn Rate: Lost Customers / Total Customers

It is possible to investigate the propensity of customers to churn by using data mining techniques to look patterns in the data that might distinguish customers with a propensity to churn, and the type of churn itself. The benefit here is that this group of potential ‘churn’ customers can be specially identified, and different approaches explored in order to retain these customers.

Understanding the patterns in customer data can have many different applications. For example, it is suggested here that the business team can use these pattern-generating techniques to discover any patterns found within the ‘customer churn’ group. These patterns can then help to predict the customers who are most likely to churn before they begin to develop relationships with competitors. These customers may become the target of focused campaigns and subject to a specific tailored marketing campaign. It could then be possible to explore ways in which to turn these customers from ‘nearly satisfied’ to ‘satisfied’, and then from ‘satisfied’ to ‘loyal’. The customer data set could be configured to actively exclude ‘involuntary’ churn customers from the set of data to be explored, so that the results are more accurate.

Reducing customer churn is a two-pronged fork:

  • Prediction – predicting the churn probability of customers
  • Discovery – discovering the root causes of churn

As part of the prediction process, it is important to understand the profitability of a certain group of customers; for example, if the customer is not very profitable, then perhaps it is possible to simply just let the customer go, and direct the marketing resources towards retaining a more profitable set of customers.

As a part of the discovery process, the business can start to understand what specific factors lead to the churn itself. These factors can then be addressed in order to reduce internal causes of churn.

Data mining techniques can be mapped to the processes of prediction and discovery. Data mining is quite new technology, and it can be difficult to know where to start. Microsoft assumes that the user has no prior knowledge of data mining algorithms, but the terminology might be a bit off-putting. To try and simplify it, here is a table which describes the main Microsoft data mining algorithms, along with a key which shows their main uses:

Algorithm

Description

Exploration

Classification

Estimation

Association

Segmentation

Forecast

Decision Trees

Finds the odds of an outcome based on values in a training set

Yes

Yes

Yes

Yes

Naïve Bayes

Clearly shows the differences in a particular variable for various data elements

Yes

Yes

Potential

Neural Nets

Seeks to uncover non-intuitive relationships in data

Potential

Yes

Yes

Association Rules

Identifies relationships between cases

Yes

Potential

Yes

Clustering

Classifies cases into distinctive groups based on any attribute sets

Yes

Potential

Potential

Potential

Yes

Sequence Clustering

Groups or clusters data based on a sequence of previous events

Yes

Potential

Potential

Potential

Yes

Time Series

Analyzes and forecasts time-based data

Yes

Yes

Linear Regression

Determines the relationship between columns in order to predict an outcome

Potential

Yes

Logistic Regression

Determines the relationship between columns in order to evaluate the probability that a column will contain a specific state

Potential

Yes

Yes

There are a number of different models to choose from, and the choice should be based on the type of data analysis that is being employed. In order to do data mining properly, it is also necessary to test and re-test results again to ensure the validity of the predictions and models in order to ensure rigorousness. Modelling exercises should look at using some of the different algorithms provided by Microsoft, in order to choose the best model by comparing results.

To summarise, Data Mining is a key technology for exploring data, and primarily Predictive Analysis. Here, it has been applied to customer churn. Specifically in terms of Microsoft, the data mining features are responsive and intuitive, and provided in a format that users trust: Excel. The Data Mining functionality in Microsoft allows users to turn data into intelligence, thereby making customer interactions more timely, focused and rewarding.

To add this post to Twitter, please click here