Day 6: The Data Analysts Toolkit: Why are Excel and R useful together, and how do we connect them?



https://www.flickr.com/photos/38953955@N07/14764024879/player/
Why is analytics interesting? Well, companies are starting to view it as profitable. For example, McKinsey showed analytics was worth 100Bn today, and estimated to be over 320Bn by 2020.
When I speak to customers, this is the ‘end goal’ – they want to use their data in order to analyse and predict what their customers are saying to them. However, it seems that folks can be a bit vague on what predictive modelling actually is.



I think that this is why Power BI and Excel are a good mix together. It makes concepts like Predictive Modelling accessible, after a bit of a learning curve. Excel is accessible and user-friendly, and we can enhance our stats delivery using R as well as Excel.

One area of interest is Predictive Modelling. This is the process of using a statistical or model to predict the value of a target variable. What does this actually mean?  Predictive modelling is where we work to the predict values in new data, rather than trying to explain an existing data set. To do this, we work with variables. By their nature, these vary; if they didn’t, they would be called a constant.

One pioneer was Francis Galton, who was a bit of an Indiana Jones in his day.  Although he wrote in the 19th century, his work is considered good and clear enough to read today. Therefore, this research has a long lineage, although it seems to be a new thing. We will start with the simplest: linear regression.

Linear regression compares two variables x and y to answer the question, “How does y change with x?” For predictive modelling, we start out with what are known as ‘predictor variables’; in terms of this question, this would be x. The result is called the target variable. In this question, this would be y. Why would we do this?

  • Machine Learning
  • Statistics
  • Programming with Software
  • Programming with Data 
  • Fun!

Why would businesses work with it at all?

  • to discover new knowledge and patterns in the data
  • to improve business results 
  • to deliver better customised services

If we have only one predictor variable and the response and the predictor variable have a linear relationship, the data can be analyzed with a simple linear model. When there is more than one predictor variable, we would use multiple regression. In this case, our question would be: , “How does y change with multiple x?” 

In fitting statistical models in which some variables are used to predict others, we want to find is that the x and y variables do not vary independently of each other, but that they tend to vary together. We hope to find that y is varying as a straight-line function of x.

If we were to visualise the data, we would hope to find a pleasing line chart which shows y and x  relating to each other in a straight line, with a minimal amount of ‘noise’ in the chart. Visualising the data means that the relationship is very clear; analysing the data means that the data itself is robust and it has been checked.

I think that’s why, in practice, Power BI, Excel and R work well together. R has got some great visualisations, but people are very comfortable with Excel for visualisations. All that loading packages stuff you have to do in R… it doesn’t work for everyone. So we use R and Excel, at a high level, as follows:

https://www.flickr.com/photos/38953955@N07/14764024879/player/
  • We cleanse and prepare data with Excel or Power Query
  • We use RODBC to load data into R
  • We analyse and verify the data in R
  • We build models in R
  • We load the data back into Excel using RODBC
  • We visualise the data for results


Excel is, after all, one of the world’s most successful software applications ever, with reputedly over one billion users. Using them both together means that you get the best of both words: R for analysis and model building: Excel is the ‘default’ for munging data around, and visualising it. I’m sure that one of the most popular buttons on software such as Tableau, QlikView et al is the ‘Export to Excel’ or ‘Export to CSV’ functionality. I’d be interested to know in what people think about that!

Building linear regression models in R is very simple; in our next session, we will look at how to do that, and then how to visualise it in Excel. Doing all this is easier than you think, and I will show you how.

What do you need for my SQLBits R and PowerBI session? Nothing!

What do you need for my SQLBits R and PowerBI session? Nothing! 
I am not expecting people to have installed anything in advance, hence I have made no requests for the event. I will cover it during the session. Usually people never do it in advance because they are hassled in work, trying to clear their desks before getting training.
When I’ve held this course before, only about a third of people joined in as i went along. The others were happy to take notes and watch demos. Sometimes people are not given laptops out of the office. I run the course with this in mind. People learn in different ways.
R has its own inbuilt data sets so you will not require any preparation in advance, in terms of the data or the software. We will be looking at a wide range of things. I will make the R scripts available to you on the day as well.
The SQLBits wifi should be good; and if it not, then I have everything on USBs.
I hope that helps and that you enjoy your day, if you are attending.

Democratization of Data: From Ideas to Decisions with Power BI

“Don’t worry about people stealing an idea. If it’s original, you will have to ram it down their throats.” Howard Aiken, Founder of Harvard’s Computing Science Program.

Data is moving so fast these days, and there is a shift whereby people are paying for value, not technology. This is where cloud computing comes in: it is very empowering, because anyone with an internet connection can access it. With Power BI in the cloud, small businesses are liberated with the ability to use the same tools and techniques to explore ideas as larger organisations.

In this session, we will look at understanding the Power BI components and tools available in the cloud, including the Power BI Admin Center, Power Query, Power Pivot, Power View and Power Map. We will look at how to use them will accelerate ideas and help to clarify decisions, and related to this, discuss the roles within IT and the business in relation to these tools. We will also look at business puzzles versus business mysteries, a definition evoked by Malcolm Gladwell (Blink, Outliers) in relation to Power BI.

“Out there in some garage is an entrepreneur who’s forging a bullet with your company’s name on it,” said Gary Hamel, a management guru. With Power BI, let’s see how you can translate your ideas in to a message that people can see, using cloud as an empowerment tool.

 

Business Intelligence Barista: Mixing your choice of BI Coffee with Tableau, Power BI or Qlikview?

** Update at 11th March: 
This is not an advert for PivotStream and I am not endorsing their services or solution. 
To show that this blog post is not an advert for PivotStream, I will explore alternatives, which I will follow up in a future post.
In the meantime, you could look at PowerBI and Office 365 with Azure, for example. 
Any questions, please leave at the bottom of this post and I will pick them up. 
I strongly advise people to make up their own mind when choosing a solution but this is blog post discussess some of the factors that you may want to take into consideration, along with other items such as technical support and so on which I do not cover here.
**

Choosing a Business Intelligence is a bit like making coffee for the whole company. Everybody likes it their way, and they want it right now. Plus, everybody wants it differently. Some want a latte, a cappuccino, or a dainty little espresso so strong that you can stand your spoon in. Some want it hot, some want it with ice, or poured over ice-cream. Some are allergic to milk and nuts, so they have to have special treatment because of the constraints on them. Plus, if that wasn’t hard enough, everyone wants the sprinkles, right? They want the syrups and they might want brown sugar, white sugar, sweetener, Stevia or just plain.  They might even want the pretty little picture on the frothy coffee to make it look nice.
You get the idea, right?

So, given that everyone has different requirements, how do you go about keeping everybody happy? If you think about how hard it is to keep everyone happy when you’re just making coffee, think how hard it is to select a business Intelligence solution. Not just any solution…. the *right* solution. The one that will keep everyone happy and give them what they want. The solution that will keep the ambulance away from the door, where constraints must be met or there will be serious trouble. The solution that will keep everyone out of danger whilst making sure that the sprinkle lovers get their sprinkles, and the folks who like a chocolate covered spoon in their coffee get a little chocolate covered spoon – in milk, dark or white…

This blog is a guidepost for people who ask me the following question a lot: Which BI tool should I choose? Truthfully, chucking a large chequebook at a technology is not going to solve all your Business Intelligence problems. To be honest, that’s like buying a great big coffee machine that might look shiny, but doesn’t actually put the sprinkles on top for you. Additionally, asking consultants and teams of sales people to do a ‘beauty parade’ of BI tools may not be the answer either. Sometimes that is just a way to stave off the decision; we are doing something so we must be doing something right because we look busy.  No-no.

So, here I have taken two extremely well known technologies: Tableau and Qlikview, and put it against a Microsoft Partner PivotStream whom you may not have come across. You may have seen recently that Microsoft are offering a Power BI solution which has the Power BI tools as a standalone option. I haven’t included it here because this technology is still in Preview, and I felt it wasn’t fair to include Preview technology. Instead, I chose PivotStream because they currently offer Microsoft BI Tools hosted in the cloud right now.

Note: These are just my insights. I didn’t tell either vendor that I was going to write this, and ultimately, the opinions are mine. However, I keep getting asked the questions, and if you are looking for a BI tool and are bewildered by the options, this post is for you. I compared on a few metrics: Business Criteria; Data Visualisation Criteria and Technical Criteria

CAVEAT: Here are my thoughts below. I expect that the people who love any of these technologies with a passion will comment, and they are free to do so. I am not saying that this is sanctioned by any of the aforementioned organisations. Instead, it is just the world how I see it, and I may be wrong! If I am, please do feel free to correct me.

Ok, to business. My assessment is at the end of the tables.

Business Criteria

Business Criteria
Tableau
Qlikview
Microsoft / Pivotstream
Comment
Time to implement
Fast
Longer
Longer
Scalability
Good
RAM Limited
Excellent
Tableau: virtual RAM
Enterprise Ready
Good for small organisations who can use the cloud option. On its own, Tableau Desktop is not expensive.
Good for medium businesses who might find it more cost-effective to take more licenses. Requirements for very small teams may not be suitable or cost-effective. There is a myriad of product options and it’s not clear how the one or two person team could have a cost-effective option.
Good for SMB
Qlikview is more mature but Microsoft has a much clearer vision than previously. Tableau is demonstrably used in many large organisations, as their customer list shows.
Long-term viability
Fastest growth
Public company
Dependable
Excel is widely used in the organisation so no adoption is required
Getting free online help?
Tableau forums
Qlikview LinkedIn group
Microsoft online communities
Tableau and Microsoft provide great free online help. Qlikview has its own forum which you sign up for. Tableau has the best free training videos I’ve ever seen.
Getting paid training
Yes
Yes
Yes
The costs vary depending on the courses.
Big Data Support
Above Average
Average
yes, ODBC connectivity to HDInsight
It is on all of their roadmaps. Tableau offers a bewildering number of ways to connect to lots of data sources. However, they don’t connect to PDW very well so Microsoft wins for PDW support. It isn’t clear if QlikView support PDW or not.
Partner Network
Average
Qlikview: 1000+ partners
Small. More direct approach.
Since PivotStream are a younger organisation, their partner network is emerging. Mainly deal direct, not via partners.

Visualisation Criteria

Visualization Criteria
Tableau
Qlikview
Microsoft / Pivotstream
Comment
Eye Candy’ Appear
Yes
medium
Medium
Tableau blows users away with its beautiful data visualisation. I’ve seen it – I know. It has ‘wow!’ factor.
Data Interactivity
Excellent
Excellent
Excellent
Tableau’s interactivity has improved a lot and hits the mark for a lot of requirements. The scripting requirement in QlikView makes me a bit wary for users. Microsoft’s various reporting tools need to be aligned more, but this isn’t news to them – I am sure it will become easier to get PerformancePoint to talk to PowerView etc. in the future. Small steps in a huge task, which is a side-effect of the sheer range of reporting offerings that Microsoft have in place today.
Visual Drilldown
Excellent
Excellent
Very Good
I’d like to see drilldown in PowerView, for example. Excel has a neat drill down feature.
Offline Viewer
Free Tableau Reader
Personal Edition
Excel spreadsheets downloaded
Tableau and Pivotstream offer Excel downloads for offline viewing.
Analyst’s Desktop
Tableau Pro
Qlikview Desktop
Excel
Excel is familiar within the organisation.
Dashboard Support
Good
Excellent
Excellent
Dashboarding methodologies can be implemented in QlikView and Tableau. Tableau has basic default KPIs but these can be manufactured easily enough. QlikView seem to be popular with finance departments and seem to talk well with them.
Web Client
Very Good
Very Good
Very Good
No real distinguishing factor here. Microsoft has Excel services and we know that business users love Excel!
Mobile Clients
Excellent
Excellent
Good
Tableau and Qlikview have an edge on Microsoft for now, but the release of PowerBI for O365 is visibly getting traction and interest in the Preview.
Visual Controls
Very Good
Very Good
Very Good

Technical Criteria

Technical Criteria
Tableau
Qlikview
Microsoft / Pivotstream
Comment
Data Integration
Excellent
Very Good
Very Good
Tableau integrates easily with Google Analytics for further analysis, but this is not required at the early stages of a BI strategy. You can get extra connectors for QlikView from DataRoket
Development
Tableau Pro
Qlikview Developer
SQL Server Business Intelligence or Excel skills
QlikView has scripting, which the organisation will need to learn. This may incur training costs. If the organisation already has strong SQL Server BI Developer skills in-house, and would not require further training..
64-bit in-RAM DB
Good
Excellent
Excellent
SQL Server ‘talks’ to other systems and will output data easily to QlikView and other formats. This is not reciprocated i.e. once the data is in QlikView, it stays in QlikView.
Mapping support
Excellent
Average
Excellent
Tableau has Mapping. Excel 2013 has 3D Power Map as a feature within it, and this is interesting for further, future analyses.
Local data files (text, spreadsheet etc.)
Yes
Yes
Yes
Relational databases (SQLServer, Oracle etc.)
Yes
Yes
Yes
OLAP cubes (SSAS, Essbase etc.)
Yes
No
No
Online data sources
Yes
Yes
Yes
Microsoft’s new Power Query allows you to search online and scrape datasets straight into Excel.
Multi-source access
Yes
Yes
Yes
Multi-table access
Yes
Yes
Yes
Extracted data storage
Optional (proprietary)
Proprietary
Data remains where it is.
Maximum capacity
Unlimited
Billions of rows
In-memory engine
Desktop or Server
Desktop or Server
Tableau reads SSAS and PowerPivot Cubes, but not very well. Tableau and QlikView want to suck the data into their own data models; Microsoft keeps the data where it is, where it is easily accessible by Microsoft and other vendors.
Modeling, Analytics
Below Average
Below Average
Data mining and other capabilities
Microsoft is the winner here for providing a range of modelling and analytics tools such as Tabular model, SSAS. Again, the organisation has experience in these tools so you are leveraging in-house skill sets.
Data Mining
Limited
Limited
Yes
By ‘Data Mining’, I mean true data mining e.g. building neural nets with thought put into it about avoiding jitter, bootstrapping and so on. This is not ‘what if’ scenarios but data science.
Multidimensional
Very Good
Limited
Excellent
Microsoft is the winner for multidimensional modelling
xVelocity Support
Good
None
Excellent
Microsoft is a pioneer in xVelocity support
PowerPivot Support
Good
None
Excellent
Microsoft is a pioneer in xVelocity support
API
Excellent
yes, documentation over at the QlikView Community site.
Excellent
Microsoft open their software up and have APIs available.

The detail is below, but the take-away summary of my thoughts are below:


From conversations with customers, they often want the simplest solution possible. Office365 has previously been pooh-poohed by organisations looking at BI because it introduced a lot of gubbins that people did not want or need i.e. Lync, Exchange and so on. This created a concern amongst the technical people and you can almost read their minds: if the bosses just want a reporting solution, why are they buying something with Exchange? Am I going to be out of a job? This meant that the technical folks were against it, and this reduced the likelihood of adoption or even getting Office365 solutions in the door. Plus, as a guiding principle, people do not buy stuff that they do not need. Fact. With the advent of PivotStream – who simply offer Microsoft BI in the cloud – the antagonism over some of the Office365 story went away.


What will happen to PivotStream now that Power BI in the Cloud is in the offing? I think that they will be fine because they offer one thing which is often key: control over your own cloud deployment. If your PivotStream sharepoint site goes down, then you can wade in yourself and fix it. That might be good for some but not so good for others. For me, that’s the USP.

What about Tableau? I’ve been a Tableau fan for years. It’s a little known fact that spoke at their Tableau European Conference in Amsterdam in 2011, and even had a Heineken with Stephen Few and some of the Tableau team and dataviz gurus (if you haven’t already, read Stephen’s blog. Now. ). Where I think Tableau succeed is that they are superb at what they do: data visualisation. For me, they are the hallmark and the touchstone of where dataviz wants to be, and they are constantly breaking boundaries. However, what they succeed at is something that many organisations aren’t quite ready for; breath-taking visualisations. For some, getting data in and out of Excel is a nightmare and they are simply not ready for it. For these people, Office365 or PivotStream are perfect.

What about QlikView? They are beloved of financial departments but I am not sure that they talk so well to IT departments. For me, the fact that there is scripting involved is an issue. If it is self-service Business Intelligence, it should be as easy as possible. In my opinion, they are a bit Marmite as we say in the UK; you either love them or you don’t. There is no half-way house. From my perspective, I hear both sides but I don’t really see QlikView fanboys the way that we see Tableau fanboys.
To summarise, sorting out user requirements is key. There is no wrong choice, if it is the right choice for your organisation. However, any technology is a bad choice if it isn’t the right choice for your organisation. Look past the flash, and see what is really right for you.
 /*** Late Breaking News ***/
My apologies: I put Tableau API as None but that isn’t correct.
I’ve fixed the table, and here is a pointer to the Tableau documentation on the JavaScript API. Thanks to Andy Cotgreave for the spot!
By way of apology, here is the Tableau API in action, done with the panache and fun you’d expect from a Tableau video.

/** Even Later breaking news *//

Sorry QlikView team! My humblest apologies. Apparently QlikView *DO* offer an API and you can find information on it here.  Quote from the site (I’m not repeating a sales pitch or commentary here, this is just straight from the community site)
The QlikView Software Development Kit is the home of the QlikView Software Integration toolbox. It includes sample code and the Application Programming Interfaces  (APIs).

Note: More content will be added in an ongoing basis.
In my defence, the comments at the foot the QlikView community page are riddled with comments from people who can’t find this or that, or don’t know the status of stuff, and then get supplied by links from other people. I’m guessing that I am not the only one who doesn’t find the information very easy to find on the community site then.
So here is the link http://community.qlikview.com/docs/DOC-2639 and I have updated the table.

Analysing Data with Hive and Power BI Slides from SQLRally Amsterdam

Here are my slides from SQLRally Amsterdam. A major thanks to the SQLRally Amsterdam crew, lead by Andre Kamman, for all their hard work in putting together this great community event!

I hope that helps!
Kind Regards,
Jen

Power Business Intelligence for Office365 – resolving the Scylla and Charbdis dilemma.

People have often complained about the fragmented methodology that seemed to accompany Microsoft technology releases. I see this as a manifestation of the Syclla and Charybdis problem – people complain if they don’t release everything as a unified whole, or complain that things take a long time to deliver. In other words, Microsoft never seem to win and seemed to be required to take one choice or another, with neither outcome pleasing everyone. I’ve commented before that it often seems that Microsoft can’t win whatever they decide to do, and here is an example blog and commentary here.

From today’s announcement, it seems as if things are coming together and Microsoft are indeed winning. This news includes a mobile business intelligence deliverable. As you know, business users love Excel and it made sense for Microsoft to put Excel at the heart of their business intelligence strategy. For those who skim read, the main point is that the Microsoft Business intelligence strategy is coming together, and now that Microsoft are delivering a user-oriented Business Intelligence solution with mobile functionality (yes, even iPad!) that hopefully a lot of the heat will have gone away and Microsoft have resolved the Scylla and Charybdis problem by producing something very special and useful for business users.

So what has been delivered? Today at the World Partner Conference, exciting news was released about this very topic, with the news that Power BI for Office 365 has been released.

Power BI for Office 365 is a self-service business intelligence solution surfaced to users through Excel and Office 365. Essentially, business users can get Excel-happy with data analysis and visualization capabilities to identify deeper business insights.  The data can be held on-premise or within a trusted cloud environment. Excel, in other words, becomes the front end for allowing business users to have fun with their data.

What does Power BI for Office 365 mean for ordinary users? With Power BI for Office 365, customers can connect to data in the cloud or extend their existing on premise data sources and systems to quickly build and deploy self-service BI solutions hosted in Microsoft’s enterprise cloud.

Power BI for Office 365 enables customers to do more with their data:

  • Analyze and present insights from data in compelling visual formats either on premises or in the cloud from Excel. 
  • Share reports and data sets online with data that is always kept up to date. 
  • Ask questions of your data using natural language search and get immediate answers through interactive tables, charts and graphs. 
  • Access and stay connected to data and reports from your mobile devices wherever you are.

I’m very happy about this announcement because it’s what customers have been waiting for. My customers have been asking for this for years, and now it is almost here, customers will see things fall into place. Excel is the way forward for Microsoft Business Intelligence: hence, PowerView, Data Explorer (now Power Query) and Geoflow (now Power Map) are now part of the Office 365 story. And what a story!

The truth is that the majority of BI work is done in Excel, and I think Microsoft are just bringing everything home to where the Business Users do most of their work. It’s about working with the Excel users, and giving people opportunities to work with other data that are outside of their internal business walls.

You can sign up for can visit http://www.office.com/powerbi on Monday, July 8 to sign up to be notified when the preview of Power BI for Office 365 is available later this summer. I’m heading over to do it now, and finally I can give my customers the answers they’ve been looking for.

Goodbye Scylla and Charybdis 🙂

Odysseus vor Scilla und Charybdis by Johann Heinrich Füssli