Convincing your HiPPO at EARL Conference in London!

ID-100257254I’m delighted to be speaking at the EARL Conference to be held in London on the 14th – 16th September. What’s my topic?

Convince your HiPPO with Real world Data Storytelling in R and Machine Learning

In a world where the HiPPO’s (Highest Paid Person’s Opinion) is final, how can we use technology to drive the organisation towards data-driven decision making as part of their organizational DNA? R provides a range of functionality in machine learning, but we need to expose its richness in a world where it is made accessible to decision makers. Using Data Storytelling with R, we can imprint data in the culture of the organization by making it easily accessible to everyone, including decision makers. Together, the insights and process of machine learning are combined with data visualisation to help organisations derive value and insights from big and little data.

In this session, we will use R and cloud-based technologies in order to explore and analyse data using machine learning and statistical packages functionality, and we will look at our results. Then, we will look at how we disseminate the results to the HiPPO audience, using best practices in data visualisation and R, informed by gurus such as Stephen Few and Edward Tufte.

If you want to know how to demystify R and the insights you’ve found during your analyses, join this session in order to learn about machine learning as a technology and a discipline, and how to make the most of your insights using best practice data visualisation. Using real-life scenarios, this session will help you to communicate the insights of your data to your HiPPO, thereby helping to move your organisation towards a data-driven culture.

R now ranks as the sixth most popular programming language  – its move from last year’s 9th place reflecting the growing importance of data analytics to an increasing number of industries and sectors.  EARL offers a unique opportunity to discover how R is being used commercially to provide a wealth of business solutions.

EARL London will feature :

  • Presentations from over 40 R gurus and Business leaders
  • Speakers represent a broad range of industry sectors  – including:  insurance, manufacturing, customer analytics, life sciences, finance etc
  • Sessions include:  Data Visualisation, Business Challenges, Big Data Technologies, Modelling, Workflow and Commercial Applications
  • Keynote speakers:  Alex Bellow, Dirk Eddelbuettel, Joe Cheng and Hannah Fry
  • Speakers representing Companies such as Shell, KPMG, AstraZeneca, Lloyd’s of London, UBS and Hewlett Packard
  • Pre Conference workshops on:  Interactive Reporting with R Markdown and Shiny (now SOLD OUT), An Introduction to Rcpp,  Integrating R and Python, and Current Best Practices in Formal Package Development
  • Sponsors and Exhibitors including Revolution Analytics, RStudio, Hewlett Packard, Teradata, Oracle, Harnham UK, Plotly, Tessella, Information Builders and O’Reilly
  • Sensational central London venue
  • 3 Conference Networking events including the Main Conference Reception in the amazing walkways of London’s iconic Tower Bridge

If you have yet to purchase your ticket please don’t delay to avoid disappointment. Tickets can be purchased online via credit card or can be invoiced if required. Group discounts are available to companies sending >5 attendees – please email earl-team@mango-solutions.com for more information.

Jen’s Diary: What does Microsoft’s recent acquisitions of Revolution Analytics mean for PASS?

Caveat: This blog does not represent the views of PASS or the PASS Board. These opinions are solely mine.

The world of data and analytics keeps heating up. Tableau, for example, keeps growing and winning. In fact, Tableau continues to grow total and licence revenue 75% year over year, with its total revenue grew to $142.9 million in the FY4 of 2014.There’s a huge shift in the market towards analytics, and it shows in the numbers. Lets take a look at some of the interesting things Microsoft have done recently, and see how it relates to PASS:

  • Acquired Revolution Analytics, an R-language-focused advanced analytics firm, will bring customers tools for prediction and big-data analytics.
  • Acquired Datazen, a provider of data visualization and key performance indicator data on Windows, iOS and Android devices. This is great from the cross-platform perspective, and we’ll look at this in a later blog. For now, let’s discuss Revolution and Microsoft.

Why it was good for Microsoft to acquire Revolution Analytics

The acquisition shows that Microsoft is bolstering its portfolio of advanced analytics tools. R is becoming increasingly common as a skill set, and businesses are more comfortable about using open source technology such as R. It is also accessible software, and a great tool for doing analytics. I’m hoping that this will help organisations to recognise and conduct advanced analytics, and it will improve the analytics capability in HDInsight.

Microsoft has got pockets of advanced analytics capabilities built into Microsoft SQL Server, and in particular, SQL Server Analysis Services, and also in the SQL Server Parallel Data Warehouse (PDW). Microsoft also has the Azure Machine Learning Service (Azure ML) which uses R in MLStudio. However, it does not have an advanced analytics studio, and the approach can come across as piecemeal for those who are new to it. The acquisition of Revolution Analytics will give Microsoft on-premises tools for data scientists, data miners, and analysts, and cloud and big data analytics for the same crowd.

Here’s what I’d like Microsoft to do with R:

  • Please give some love to SSRS by infusing it with R. There is a codeplex download that will help you to produce R visualisations in SSRS. I’d like to see more and easier integration, which doesn’t require a lot of hacking about.
  • Power Query has limited statistical capability at the moment. It could be expanded to include R. I am not keen for Microsoft to develop yet another programming language and R could be a part of the Power Query story.
  • Self-service analytics. We’ve all seen the self-service business intelligence communications. What about helping people to self-serve analytics as well, once they’ve cracked self-service BI? I’d like to see R made easier to use for everyone. I sense that will be a long way off, but it is an opportunity.
  • Please change the R facility in MLStudio. It’s better to use RStudio to create your R script, then upload it.

What issues do I see in the Revolution Analytics acquisition?

Microsoft is a huge organisation. Where will it sit within the organisation? Any acquisition involves a change management process. Change management is always hard. R touches different parts of the technology stack. This could be further impacted by the open source model that R has been developed under. Fortunately Revolution seem to have thought of some of these issues already: how does it scale, for example? This acquisition will need to be carefully envisioned, communicated and implemented, and I really do wish them every success with it.

What does this mean for PASS?

I hold the PASS Business Analytics Portfolio, and our PASS Business Analytics Conference is being held next week. Please use code BFFJS to get the conference for a discount rate, if you are interested in going.

I think the PASS strategy of becoming more data platform focused is the right one. PASS exist to provide technical community education to data professionals, and I think PASS are well placed to move on the analytics journey that we see in the industry. I already held a series on R for the Data Science Virtual Chapter, and I’m confident you’ll see more material on this and related topics. There are sessions on R at the PASS BA Conference as well. The addition of Revolution Analytics and Datazen is great for Microsoft, and it means that the need for learning in these areas is more urgent, not less. That does not mean that i think that everyone should learn analytics. I don’t. However, I do think PASS can help those who are part of the journey, if they want (or need) to be.

I’m personally glad PASS are doing the PASS Business Analytics Conference because I believe it is a step in the right direction, in the analytics journey we see for the people who want to learn analytics, the businesses who want to use it, and the burgeoning technology. I agree with Brent Ozar ( b / t ) in that I don’t think that the role of the DBA is going away. I do think that, for small / medium businesses, some folks might find that they become the ‘data’ person rather than the DBA being a skill on its own. I envisage that PASS will continue to serve the DBA-specialist-guru as well as the BI-to-analytics people, as well as those who become the ‘one-stop-shop’ for everything data in their small organisation (DBA / BA / Analytics), as well as the DBA-and-Cloud person. It’s about giving people opportunity to learn what they want and need to learn, in order to keep up with the rate of change we see in the industry.

Please feel free to comment below.

Your friend,

Jen Stirrup

x

Visualising Data with R: HTMLWidgets for R

I’m interested in analyzing and visualizing data with R, as you know. Thanks to Jen Underwood, she pointed me at this cool series of widgets, called HTMLWidgets, that allows you to do some interesting visualisations with R. I must admit I haven’t touched JavaScript for years, ever since I helped to build websites for well known entertainment sites, but I’m glad I can pick it up again. What does it let you do?

  • Use JavaScript visualization libraries at the R console, just like plots
  • Embed widgets in R Markdown documents and Shiny web applications
  • Develop new widgets using a framework that seamlessly bridges R and JavaScript

Why don’t you head on over to the site and have a look? I’ll be playing with it, and finally finishing my R blog series, as soon as I can. I’ve been busy helping to organize the PASS BA Conference, which is taking place in April 2015 in Santa Clara. Therefore, I hope you’ll excuse the radio silence. It’s on my to-do list, which never goes down, ever!

Day 6: The Data Analysts Toolkit: Why are Excel and R useful together, and how do we connect them?



https://www.flickr.com/photos/38953955@N07/14764024879/player/
Why is analytics interesting? Well, companies are starting to view it as profitable. For example, McKinsey showed analytics was worth 100Bn today, and estimated to be over 320Bn by 2020.
When I speak to customers, this is the ‘end goal’ – they want to use their data in order to analyse and predict what their customers are saying to them. However, it seems that folks can be a bit vague on what predictive modelling actually is.



I think that this is why Power BI and Excel are a good mix together. It makes concepts like Predictive Modelling accessible, after a bit of a learning curve. Excel is accessible and user-friendly, and we can enhance our stats delivery using R as well as Excel.

One area of interest is Predictive Modelling. This is the process of using a statistical or model to predict the value of a target variable. What does this actually mean?  Predictive modelling is where we work to the predict values in new data, rather than trying to explain an existing data set. To do this, we work with variables. By their nature, these vary; if they didn’t, they would be called a constant.

One pioneer was Francis Galton, who was a bit of an Indiana Jones in his day.  Although he wrote in the 19th century, his work is considered good and clear enough to read today. Therefore, this research has a long lineage, although it seems to be a new thing. We will start with the simplest: linear regression.

Linear regression compares two variables x and y to answer the question, “How does y change with x?” For predictive modelling, we start out with what are known as ‘predictor variables’; in terms of this question, this would be x. The result is called the target variable. In this question, this would be y. Why would we do this?

  • Machine Learning
  • Statistics
  • Programming with Software
  • Programming with Data 
  • Fun!

Why would businesses work with it at all?

  • to discover new knowledge and patterns in the data
  • to improve business results 
  • to deliver better customised services

If we have only one predictor variable and the response and the predictor variable have a linear relationship, the data can be analyzed with a simple linear model. When there is more than one predictor variable, we would use multiple regression. In this case, our question would be: , “How does y change with multiple x?” 

In fitting statistical models in which some variables are used to predict others, we want to find is that the x and y variables do not vary independently of each other, but that they tend to vary together. We hope to find that y is varying as a straight-line function of x.

If we were to visualise the data, we would hope to find a pleasing line chart which shows y and x  relating to each other in a straight line, with a minimal amount of ‘noise’ in the chart. Visualising the data means that the relationship is very clear; analysing the data means that the data itself is robust and it has been checked.

I think that’s why, in practice, Power BI, Excel and R work well together. R has got some great visualisations, but people are very comfortable with Excel for visualisations. All that loading packages stuff you have to do in R… it doesn’t work for everyone. So we use R and Excel, at a high level, as follows:

https://www.flickr.com/photos/38953955@N07/14764024879/player/

  • We cleanse and prepare data with Excel or Power Query
  • We use RODBC to load data into R
  • We analyse and verify the data in R
  • We build models in R
  • We load the data back into Excel using RODBC
  • We visualise the data for results


Excel is, after all, one of the world’s most successful software applications ever, with reputedly over one billion users. Using them both together means that you get the best of both words: R for analysis and model building: Excel is the ‘default’ for munging data around, and visualising it. I’m sure that one of the most popular buttons on software such as Tableau, QlikView et al is the ‘Export to Excel’ or ‘Export to CSV’ functionality. I’d be interested to know in what people think about that!

Building linear regression models in R is very simple; in our next session, we will look at how to do that, and then how to visualise it in Excel. Doing all this is easier than you think, and I will show you how.