Learning pathway for SQL Server 2016 and R Part 2: Divided by a Common Language

http://whatculture.com/film/the-office-uk-vs-the-office-us.php

Britain has “really everything in common with America nowadays, except, of course, language.” Said Oscar Wilde, in the Centerville Ghost (1887) whilst George Bernard Shaw is quoted as saying that the “The United States and Great Britain are two countries separated by a common language.”

There are similarities and differences between SQL and R, which might be confusing. However, I think it can be illuminating to understand these similarities and differences since it tells you something about each language. I got this idea from one of the attendees at PASS Summit 2015 and my kudos and thanks go to her. I’m sorry I didn’t get  her name, but if you see this you will know who you are, so please feel free to leave a comment so that I can give you a proper shout out.

If you are looking for an intro to R from the Excel perspective, see this brilliant blog here. Here’s a list onto get us started. If you can think of any more, please give me a shout and I will update it. It’s just an overview and it’s to help the novice get started on a path of self-guided research into both of these fascinating topics.

R SQL / BI background
A Factor has special properties; it can represent a categorical variable, which are used in linear regression, ANOVA etc. It can also be used for grouping. A Dimension is a way of describing categorical variables. We see this in the Microsoft Business Intelligence stack.
in R, dim means that we can give a chunk of data dimensions, or, in other words, give it a size. You could use dim to turn a list into a matrix, for example Following Kimball methodology, we tend to prefix tables as dim if they are dimension tables. Here, we mean ‘dimensions’ in the Kimball sense, where a ‘dimension’ is a way of describing data. If you take a report title, such as Sales by geography, then ‘geography’ would be your dimension.
R memory management can be confusing. Read Matthew Keller’s excellent post here. If you use R to look at large data sets, you’ll need to know
– how much memory an object is taking;
– 32-bit R vs 64-bit R;
– packages designed to store objects on disk, not RAM;
– gc() for memory garbage collection
– reduce memory fragmentation.
SQL Server 2016 CTP3 brings native In-database support for the open source R language. You can call both R, RevoScaleR functions and scripts directly from within a SQL query. This circumvents the R memory issue because SQL Server benefits the user, by introducing multi-threaded and multi-core in-DB computations
Data frame is a way of storing data in tables. It is a tightly coupled collections of variables arranged in rows and columns. It is a fundamental data structure in R. In SQL SSRS, we would call this a data set. In T-SQL, it’s just a table. The data is formatted into rows and columns, with mixed data types.
All columns in a matrix must have the same mode(numeric, character, and so on) and the same length. A matrix in SSRS is a way of displaying, grouping and summarizing data. It acts like a pivot table in Excel.
 <tablename>$<columnname> is one way you can call a table with specific reference to a column name.  <tablename>.<columname> is how we do it in SQL, or you could just call the column name on its own.
To print something, type in the variable name at the command prompt. Note, you can only print items one at a time, so use cat to combine multiple items to print out. Alternatively, use the print function. One magic feature of R is that it knows magically how to format any R value for printing e.g.

print(matrix(c(1,2,3,5),2,2))

PRINT returns a user-defined message to the client. See the BOL entry here. https://msdn.microsoft.com/en-us/library/ms176047.aspx

CONCAT returns a string that is the result of concatenating two or more string values. https://msdn.microsoft.com/en-GB/library/hh231515.aspx

Variables allow you to store data temporarily during the execution of code. If you define it at the command prompt, the variable is contained in your workspace. It is held in memory, but it can be saved to disk. In R, variables are dynamically typed so you can chop and change the type as you see fit. Variables are declared in the body of a batch or procedure with the DECLARE statement and are assigned values by using either a SET or SELECT statement. Variables are not dynamically typed, unlike R. For in-depth look at variables, see Itzik Ben-Gan’s article here.
ls allows you to list the variables and functions in your workspace. you can use ls.str to list out some additional information about each variable. SQL Server has tables, not arrays. It works differently, and you can find a great explanation over at Erland Sommarskog’s blog. For SQL Server 2016 specific information, please visit the Microsoft site.
A Vector is a key data structure in R, which has tons of flexibility and extras. Vectors can’t have a mix of data types, and they are created using the c(…) operator. If it is a vector of vectors, R makes them into a single vector. Batch-mode execution is sometimes known as vector-based or vectorized execution. It is a query processing method in which queries process multiple rows together. A popular item in SQL Server 2016 is Columnstore Indexes, which uses batch-mode execution. To dig into more detail, I’d recommend Niko Neugebauer’s excellent blog series here, or the Microsoft summary.

There will be plenty of other examples, but I hope that helps for now.

Business Intelligence Barista: Mixing your choice of BI Coffee with Tableau, Power BI or Qlikview?

** Update at 11th March: 
This is not an advert for PivotStream and I am not endorsing their services or solution. 
To show that this blog post is not an advert for PivotStream, I will explore alternatives, which I will follow up in a future post.
In the meantime, you could look at PowerBI and Office 365 with Azure, for example. 
Any questions, please leave at the bottom of this post and I will pick them up. 
I strongly advise people to make up their own mind when choosing a solution but this is blog post discussess some of the factors that you may want to take into consideration, along with other items such as technical support and so on which I do not cover here.
**

Choosing a Business Intelligence is a bit like making coffee for the whole company. Everybody likes it their way, and they want it right now. Plus, everybody wants it differently. Some want a latte, a cappuccino, or a dainty little espresso so strong that you can stand your spoon in. Some want it hot, some want it with ice, or poured over ice-cream. Some are allergic to milk and nuts, so they have to have special treatment because of the constraints on them. Plus, if that wasn’t hard enough, everyone wants the sprinkles, right? They want the syrups and they might want brown sugar, white sugar, sweetener, Stevia or just plain.  They might even want the pretty little picture on the frothy coffee to make it look nice.
You get the idea, right?

So, given that everyone has different requirements, how do you go about keeping everybody happy? If you think about how hard it is to keep everyone happy when you’re just making coffee, think how hard it is to select a business Intelligence solution. Not just any solution…. the *right* solution. The one that will keep everyone happy and give them what they want. The solution that will keep the ambulance away from the door, where constraints must be met or there will be serious trouble. The solution that will keep everyone out of danger whilst making sure that the sprinkle lovers get their sprinkles, and the folks who like a chocolate covered spoon in their coffee get a little chocolate covered spoon – in milk, dark or white…

This blog is a guidepost for people who ask me the following question a lot: Which BI tool should I choose? Truthfully, chucking a large chequebook at a technology is not going to solve all your Business Intelligence problems. To be honest, that’s like buying a great big coffee machine that might look shiny, but doesn’t actually put the sprinkles on top for you. Additionally, asking consultants and teams of sales people to do a ‘beauty parade’ of BI tools may not be the answer either. Sometimes that is just a way to stave off the decision; we are doing something so we must be doing something right because we look busy.  No-no.

So, here I have taken two extremely well known technologies: Tableau and Qlikview, and put it against a Microsoft Partner PivotStream whom you may not have come across. You may have seen recently that Microsoft are offering a Power BI solution which has the Power BI tools as a standalone option. I haven’t included it here because this technology is still in Preview, and I felt it wasn’t fair to include Preview technology. Instead, I chose PivotStream because they currently offer Microsoft BI Tools hosted in the cloud right now.

Note: These are just my insights. I didn’t tell either vendor that I was going to write this, and ultimately, the opinions are mine. However, I keep getting asked the questions, and if you are looking for a BI tool and are bewildered by the options, this post is for you. I compared on a few metrics: Business Criteria; Data Visualisation Criteria and Technical Criteria

CAVEAT: Here are my thoughts below. I expect that the people who love any of these technologies with a passion will comment, and they are free to do so. I am not saying that this is sanctioned by any of the aforementioned organisations. Instead, it is just the world how I see it, and I may be wrong! If I am, please do feel free to correct me.

Ok, to business. My assessment is at the end of the tables.

Business Criteria

Business Criteria
Tableau
Qlikview
Microsoft / Pivotstream
Comment
Time to implement
Fast
Longer
Longer
Scalability
Good
RAM Limited
Excellent
Tableau: virtual RAM
Enterprise Ready
Good for small organisations who can use the cloud option. On its own, Tableau Desktop is not expensive.
Good for medium businesses who might find it more cost-effective to take more licenses. Requirements for very small teams may not be suitable or cost-effective. There is a myriad of product options and it’s not clear how the one or two person team could have a cost-effective option.
Good for SMB
Qlikview is more mature but Microsoft has a much clearer vision than previously. Tableau is demonstrably used in many large organisations, as their customer list shows.
Long-term viability
Fastest growth
Public company
Dependable
Excel is widely used in the organisation so no adoption is required
Getting free online help?
Tableau forums
Qlikview LinkedIn group
Microsoft online communities
Tableau and Microsoft provide great free online help. Qlikview has its own forum which you sign up for. Tableau has the best free training videos I’ve ever seen.
Getting paid training
Yes
Yes
Yes
The costs vary depending on the courses.
Big Data Support
Above Average
Average
yes, ODBC connectivity to HDInsight
It is on all of their roadmaps. Tableau offers a bewildering number of ways to connect to lots of data sources. However, they don’t connect to PDW very well so Microsoft wins for PDW support. It isn’t clear if QlikView support PDW or not.
Partner Network
Average
Qlikview: 1000+ partners
Small. More direct approach.
Since PivotStream are a younger organisation, their partner network is emerging. Mainly deal direct, not via partners.

Visualisation Criteria

Visualization Criteria
Tableau
Qlikview
Microsoft / Pivotstream
Comment
Eye Candy’ Appear
Yes
medium
Medium
Tableau blows users away with its beautiful data visualisation. I’ve seen it – I know. It has ‘wow!’ factor.
Data Interactivity
Excellent
Excellent
Excellent
Tableau’s interactivity has improved a lot and hits the mark for a lot of requirements. The scripting requirement in QlikView makes me a bit wary for users. Microsoft’s various reporting tools need to be aligned more, but this isn’t news to them – I am sure it will become easier to get PerformancePoint to talk to PowerView etc. in the future. Small steps in a huge task, which is a side-effect of the sheer range of reporting offerings that Microsoft have in place today.
Visual Drilldown
Excellent
Excellent
Very Good
I’d like to see drilldown in PowerView, for example. Excel has a neat drill down feature.
Offline Viewer
Free Tableau Reader
Personal Edition
Excel spreadsheets downloaded
Tableau and Pivotstream offer Excel downloads for offline viewing.
Analyst’s Desktop
Tableau Pro
Qlikview Desktop
Excel
Excel is familiar within the organisation.
Dashboard Support
Good
Excellent
Excellent
Dashboarding methodologies can be implemented in QlikView and Tableau. Tableau has basic default KPIs but these can be manufactured easily enough. QlikView seem to be popular with finance departments and seem to talk well with them.
Web Client
Very Good
Very Good
Very Good
No real distinguishing factor here. Microsoft has Excel services and we know that business users love Excel!
Mobile Clients
Excellent
Excellent
Good
Tableau and Qlikview have an edge on Microsoft for now, but the release of PowerBI for O365 is visibly getting traction and interest in the Preview.
Visual Controls
Very Good
Very Good
Very Good

Technical Criteria

Technical Criteria
Tableau
Qlikview
Microsoft / Pivotstream
Comment
Data Integration
Excellent
Very Good
Very Good
Tableau integrates easily with Google Analytics for further analysis, but this is not required at the early stages of a BI strategy. You can get extra connectors for QlikView from DataRoket
Development
Tableau Pro
Qlikview Developer
SQL Server Business Intelligence or Excel skills
QlikView has scripting, which the organisation will need to learn. This may incur training costs. If the organisation already has strong SQL Server BI Developer skills in-house, and would not require further training..
64-bit in-RAM DB
Good
Excellent
Excellent
SQL Server ‘talks’ to other systems and will output data easily to QlikView and other formats. This is not reciprocated i.e. once the data is in QlikView, it stays in QlikView.
Mapping support
Excellent
Average
Excellent
Tableau has Mapping. Excel 2013 has 3D Power Map as a feature within it, and this is interesting for further, future analyses.
Local data files (text, spreadsheet etc.)
Yes
Yes
Yes
Relational databases (SQLServer, Oracle etc.)
Yes
Yes
Yes
OLAP cubes (SSAS, Essbase etc.)
Yes
No
No
Online data sources
Yes
Yes
Yes
Microsoft’s new Power Query allows you to search online and scrape datasets straight into Excel.
Multi-source access
Yes
Yes
Yes
Multi-table access
Yes
Yes
Yes
Extracted data storage
Optional (proprietary)
Proprietary
Data remains where it is.
Maximum capacity
Unlimited
Billions of rows
In-memory engine
Desktop or Server
Desktop or Server
Tableau reads SSAS and PowerPivot Cubes, but not very well. Tableau and QlikView want to suck the data into their own data models; Microsoft keeps the data where it is, where it is easily accessible by Microsoft and other vendors.
Modeling, Analytics
Below Average
Below Average
Data mining and other capabilities
Microsoft is the winner here for providing a range of modelling and analytics tools such as Tabular model, SSAS. Again, the organisation has experience in these tools so you are leveraging in-house skill sets.
Data Mining
Limited
Limited
Yes
By ‘Data Mining’, I mean true data mining e.g. building neural nets with thought put into it about avoiding jitter, bootstrapping and so on. This is not ‘what if’ scenarios but data science.
Multidimensional
Very Good
Limited
Excellent
Microsoft is the winner for multidimensional modelling
xVelocity Support
Good
None
Excellent
Microsoft is a pioneer in xVelocity support
PowerPivot Support
Good
None
Excellent
Microsoft is a pioneer in xVelocity support
API
Excellent
yes, documentation over at the QlikView Community site.
Excellent
Microsoft open their software up and have APIs available.

The detail is below, but the take-away summary of my thoughts are below:


From conversations with customers, they often want the simplest solution possible. Office365 has previously been pooh-poohed by organisations looking at BI because it introduced a lot of gubbins that people did not want or need i.e. Lync, Exchange and so on. This created a concern amongst the technical people and you can almost read their minds: if the bosses just want a reporting solution, why are they buying something with Exchange? Am I going to be out of a job? This meant that the technical folks were against it, and this reduced the likelihood of adoption or even getting Office365 solutions in the door. Plus, as a guiding principle, people do not buy stuff that they do not need. Fact. With the advent of PivotStream – who simply offer Microsoft BI in the cloud – the antagonism over some of the Office365 story went away.


What will happen to PivotStream now that Power BI in the Cloud is in the offing? I think that they will be fine because they offer one thing which is often key: control over your own cloud deployment. If your PivotStream sharepoint site goes down, then you can wade in yourself and fix it. That might be good for some but not so good for others. For me, that’s the USP.

What about Tableau? I’ve been a Tableau fan for years. It’s a little known fact that spoke at their Tableau European Conference in Amsterdam in 2011, and even had a Heineken with Stephen Few and some of the Tableau team and dataviz gurus (if you haven’t already, read Stephen’s blog. Now. ). Where I think Tableau succeed is that they are superb at what they do: data visualisation. For me, they are the hallmark and the touchstone of where dataviz wants to be, and they are constantly breaking boundaries. However, what they succeed at is something that many organisations aren’t quite ready for; breath-taking visualisations. For some, getting data in and out of Excel is a nightmare and they are simply not ready for it. For these people, Office365 or PivotStream are perfect.

What about QlikView? They are beloved of financial departments but I am not sure that they talk so well to IT departments. For me, the fact that there is scripting involved is an issue. If it is self-service Business Intelligence, it should be as easy as possible. In my opinion, they are a bit Marmite as we say in the UK; you either love them or you don’t. There is no half-way house. From my perspective, I hear both sides but I don’t really see QlikView fanboys the way that we see Tableau fanboys.
To summarise, sorting out user requirements is key. There is no wrong choice, if it is the right choice for your organisation. However, any technology is a bad choice if it isn’t the right choice for your organisation. Look past the flash, and see what is really right for you.
 /*** Late Breaking News ***/
My apologies: I put Tableau API as None but that isn’t correct.
I’ve fixed the table, and here is a pointer to the Tableau documentation on the JavaScript API. Thanks to Andy Cotgreave for the spot!
By way of apology, here is the Tableau API in action, done with the panache and fun you’d expect from a Tableau video.

/** Even Later breaking news *//

Sorry QlikView team! My humblest apologies. Apparently QlikView *DO* offer an API and you can find information on it here.  Quote from the site (I’m not repeating a sales pitch or commentary here, this is just straight from the community site)
The QlikView Software Development Kit is the home of the QlikView Software Integration toolbox. It includes sample code and the Application Programming Interfaces  (APIs).

Note: More content will be added in an ongoing basis.
In my defence, the comments at the foot the QlikView community page are riddled with comments from people who can’t find this or that, or don’t know the status of stuff, and then get supplied by links from other people. I’m guessing that I am not the only one who doesn’t find the information very easy to find on the community site then.
So here is the link http://community.qlikview.com/docs/DOC-2639 and I have updated the table.

Analysis Services Demystified video for the ‘need to learn it quickly’ Business Intelligence professionals

My Analysis Services Demystified video for the ‘need to learn it quickly!’ Business Intelligence professional has 60% off! Bargain with Packt Publishing


The video itself is for people who are suddenly landed with an Analysis Services project and need to get an overview of the technology very quickly.

Once they’ve had a look at this video, they could then move forward to one of my favourite Analysis Services books by Chris Webb, and the link is here. I pre-ordered this book before it came out, and it’s a great book.

I am very proud because it took a long time and many, many hours to produce. It’s not like writing a book, where you can hide away for a bit. The recordings demand total silence and even if there is nobody around, the world is quite a noisy place! Birdsong, which normally I love, suddenly plagued my recordings so it was hard to find really ‘quiet time’. My computer fan kicking off….. My neighbour’s television…. And the list goes on and on 🙂

I’d like to thank the team at Packt for their patience: Hyacintha, Shradda and the others. They know who they are!

Notes to Self: useful links for Power View and Multidimensional models

Here is a collation of some useful Power View and Multidimensional Models literature from TechNet and Office.com.
Quote from the above article: This does not support connecting to and creating reports for multidimensional models. Power View for multidimensional models supports browser based Power View clients only.
 In light of this information, it makes sense that you will need to create RSDS files in SharePoint and here is the link on how to do this: Create a Shared Report Data Source Connection to a Multidimensional Model http://social.technet.microsoft.com/wiki/cfs-file.ashx/__key/communityserver-components-sitefiles/10_5F00_external.png 
Once you have set up your connection, you can create your Power View report: Create a Power View Report with a Multidimensional Model Data Source http://social.technet.microsoft.com/wiki/cfs-file.ashx/__key/communityserver-components-sitefiles/10_5F00_external.png
Client
To get an overview of how Power View in SharePoint works with Analysis Services Multidimensional models, take a look here: Understanding Multidimensional Model Objects in Power View http://social.technet.microsoft.com/wiki/cfs-file.ashx/__key/communityserver-components-sitefiles/10_5F00_external.png

Explore the Adventure Works Multidimensional Model by using Power View http://social.technet.microsoft.com/wiki/cfs-file.ashx/__key/communityserver-components-sitefiles/10_5F00_external.png – This article walks users through creating a Power View report and exploring the sample Adventure Works multidimensional model.

Want to learn Analysis Services and not sure where to start? Here’s a helping hand!

Recently, someone asked me for a list of good books, sites and other resources for learning SSAS. Here are my thoughts!
  • You’re an absolute beginner. What do you do?
  • What in-person events can you attend?
  • So you’ve done all these, and want more Analysis Services. What next?  
  • My favourite SSAS sites! 

You’re an absolute beginner. What do you do?

Start with the Technet Tutorial. Follow it step-by-step. When you’re done, then do it again – but this time, don’t look at the site to do it! This will help you to see what areas didn’t ‘stick’ the first time.

Take a look at the Microsoft Free e-book collection. Yes, free! Go and take a look! See if anything there will help you.

Try the TechEd webinars. Here is a range of the top ones, from Dandy Weyn  

I’d recommend the Analysis Services team blog for a regular read. 

I’d also spend time on the Professional Association of SQL Server  (SQLPASS) website. If you’re not a member, it’s free to join, and you can access a wealth of material, from beginners to advanced, delivered by SQL Server experts around the globe – for FREE! How awesome is that?
For example, take a look at their Business Intelligence Virtual Chapter for webinars, downloads and more. Well worth a look!
If you can, I’d think about purchasing the DVDs from the SQLPass Summit. I have had some much value from these DVDs. In fact, I still listen to 2010 Summit sessions, as well as 2011. Therefore, I’ve had massive value from the Summit DVDs, making it a great investment. 

What in-person events can you attend?

Attend a local SQL Server User Group. These are usually organised by SQLPass volunteers, and you can probably find one near you!   

If you are in Europe, attend SQLBits. No excuses. It has to be done! The UK has a great SQL Server community and this is the longest-standing, largest free European SQL Server community conference. I love this conference – great people, great learning, great community spirit!

If you are in the US – or can travel to the US – go to SQLPass Summit. This is the pinnacle of SQL Server community events in the US, and I regard it as a huge honour to be picked to speak. Like SQLBits,  I love this conference – great people, great learning, great community spirit! Once you attend for your first time, you’ll go home and book up for your second visit next year. Seriously. It is *that* good. I’m travelling 5000 miles to attend – there’s dedication for you!

If you can’t afford to attend SQLPass Summit, but still want free (or nearly free!) in-person training, take a look at the SQLSaturday events that occur across the globe.  Most events are free, but some ask for a contribution towards lunch. In my opinion, this is a great investment of your time.

If you can make Orlando in December, think about attending SQL Live 360. This looks set to be an awesome conference. There are different streams, including SharePoint and Visual Studio. You can register here! 

So you’ve done all these, and want more Analysis Services. What next? 

Expert Cube Development with Microsoft SQL Server 2008 Analysis Services


This book is by Chris Webb, Marco Russo and Alberto Ferrari. It’s very well written, and it is a great reference and guide from very experienced authors, who will help you to take your SSAS to a deeper level. It is full of practical advice, and will help you to deliver your SSAS cube more professionally.

This book is very detailed, and I’d put this at more the ‘expert’ level rather than the beginner. It’s a great reference for nuggets of information
My favourite SSAS sites:

Chris Webb‘s blog

Marco Russo and Alberto Ferrari blog

 I hope that helps!

Jen

 

 

 

MSDN Analysis Services 2012 Tutorial and AdventureWorks Codeplex files Re-Pointer

I believe that the MSDN Analysis Services tutorial and AdventureWorks files on Codeplex aren’t quite matching up.
For the tutorial ‘Using a Modified Version of the Analysis Services Tutorial Project’ (http://msdn.microsoft.com/en-us/library/ms166582.aspx), the reader is instructed to download the file ‘SSAS Multidimensional Model Projects SQL Server 2012.zip’

The reader is directed to ‘Browse to the Lesson 4 Start folder’ but this isn’t contained in ‘SSAS Multidimensional Model Projects SQL Server 2012.zip’

Instead, you find the folder ‘Lesson 4 Start’ in ‘Analysis Services Tutorial SQL Server 2012.zip

I hope that helps someone to get five minutes of their life back 🙂

-Jen

Some tips on doing the Analysis Services 2012 Tutorials

I’ve had some questions from people using the AdventureWorks tutorials on Analysis Services 2012, so I thought I would answer them here.
When deploying the Analysis Services project for the first time, you might get the following error message:
‘A connection cannot be made. Ensure that the server is running. To verify or update the name of the target server, right-click on the project in Solution Explorer, select Project Properties, click on the Deployment tab, and then enter the name of the server.’
The first thing to check is that the login has access to the data source. Also, check that you’ve put in the full server\instance name if required: if you’re not sure how to check, open up SSMS and see if you can connect that way, and the full path will be in the login box.
If you’ve checked this, then it is worthwhile just checking that your Analysis Services service is actually started. This is simple: Go to Control Panel -> Administrative Tools -> Services and look for your Analysis Services service. If you don’t know which service to look for, try ‘SQL Server Analysis Services (MSSQLSERVER)’ and see if it is started.
Once you try to deploy the cube, you might get an error message about an Impersonation issue. If this is the case, then you will need to look at the Impersonation information that you’ve used in the data source. To do this, double-click on the data source, and choose the ‘Impersonation Information’. As per the tutorial, the setting says to use the option ‘Use the service account’ 
I hope that this answers some questions I’ve received – good luck, and please feel free to get in touch if anything isn’t clear.
-Jen