Five reasons to be excited about Microsoft Data Insights Summit!

ms-datainsights_linkedin-1200x627-1

I’m delighted to be speaking at Microsoft Data Summit! I’m pumped about my session, which focuses on Power BI for the CEO. I’m also super happy to be attending the Microsoft Data Summit for five top reasons (and others, but five is a nice number!). I’m excited about all of the Excel, Power BI, DAX and Data Science goodies. Here are some sample session titles:

Live Data Streaming in Power BI

Data Science for Analysts

What’s new in Excel

Embed R in Power BI

Spreadsheet Management and Compliance (It is a topic that keeps me up at night!)

Book an in-person appointment with a Microsoft expert with the online Schedule Builder. Bring your hard – or easy – questions! In itself, this is a real chance to speak to Microsoft directly and get expert, indepth  help from the team who make the software that you love.

Steven Levitt of Freakonomics is speaking and I’m delighted to hear him again. I’ve heard him present recently and he was very funny whilst also being insightful. I think you’ll enjoy his session. You’ll know him from Freakonomics.

freakonomics

I’m excited that James Phillips is delivering a keynote! I have had the pleasure of meeting him a few times and I am really excited about where James and the Power BI team have taken Power BI. I’m sure that there will be good things as they steam ahead, so James’ keynote is unmissable!

Alberto Cairo is presenting a keynote! Someone who always makes me sit up a bit straighter when they tweet is Alberto Cairo, and I’m delighted he’s attending. I hope I can get to meet him in person. Whether Alberto is tweeting about data visualisation, design or the world in general, it’s always insightful. I have his latest book and I hope I can ask him to sign it.

003b6f66

Tons of other great speakers! Now someone I haven’t seen for ages – too long in fact – is Rob Collie. Rob is President of PowerPivotPro and you simply have to hear him speak on the topic. He’s direct in explaining how things work, and you will learn from him. I’m glad to see Marco Russo is speaking and I love his sessions. In fact, at TechEd North America, I only got to see one session because I was so busy with presenting, booth duty etc… but I managed to get to see a session and I made sure it was Marco Russo and Alberto Ferrari’s session.  Chris Webb is also presenting and his sessions are always amazing. I have to credit Chris in part for where I am today, because his blog kept me sane and his generosity during sessions meant that I never felt stupid asking him questions. I’m learning too – always.

Ok, that’s five things but there are plenty more. Why not see for yourself?

Join me at the conference, June 12–13, 2017 in Seattle, WA — and be sure to sign up for your 1:1 session with a Microsoft expert.

PASS Business Analytics Day, Jan 11, Chicago

pass-ba-day

PASS’ first Business Analytics Day, which will be held in Chicago on January 11, 2017. You can choose one of two full-day, in-depth sessions for $595: In-Database Analytics with R and SQL Server 2016 and Mastering Power BI Solutions.

These are unique learning opportunities to get more advanced in R or data visualization with Power BI. And as with other PASS events, the goal is to allow you to walk away with real-world analytics knowledge that you can use immediately!

PASS Business Analytics Day

You have two great choices: In-Database Analytics with R and SQL Server 2016 and Mastering Power BI Solutions.

In-Database Analytics with R and SQL Server 2016

With Microsoft SQL Server 2016, data scientists can run in-database analytics using R. This is a “best of both worlds” scenario: delegate database management to SQL Server whilst you create analytics and visualisations in R and Power BI. In this session, we will cover the overall architecture of SQL R Services and go over some best practices. We will look at best practices in analytics and visualisations with a focus on R, and then we delve more in-depth into some practical common use-cases.

Speakers:
David Smith, R Community Lead at Revolution Analytics, a Microsoft Company
Seth Mottaghinejad, Data Scientist, Microsoft

Mastering Power BI Solutions

In this Power BI hands-on Workshop, you will master the “power” of Power BI. Learn to use self-service and enterprise-scale Power BI capabilities; gain valuable skills to integrate, wrangle, shape and visualize data for analysis. Beginning and intermediate level users will learn to address data and reporting challenges with advanced design techniques.

Speaker:
Paul Turley, Mentor with SolidQ, BI Architect, and Microsoft Data Platform MVP

Date: January 11, 2017

Location: Microsoft Technology Center, #200 – 200 East Randolph Drive, Chicago, IL.

We hope you’ll join us!

Learning pathway for SQL Server 2016 and R Part 2: Divided by a Common Language

http://whatculture.com/film/the-office-uk-vs-the-office-us.php

Britain has “really everything in common with America nowadays, except, of course, language.” Said Oscar Wilde, in the Centerville Ghost (1887) whilst George Bernard Shaw is quoted as saying that the “The United States and Great Britain are two countries separated by a common language.”

There are similarities and differences between SQL and R, which might be confusing. However, I think it can be illuminating to understand these similarities and differences since it tells you something about each language. I got this idea from one of the attendees at PASS Summit 2015 and my kudos and thanks go to her. I’m sorry I didn’t get  her name, but if you see this you will know who you are, so please feel free to leave a comment so that I can give you a proper shout out.

If you are looking for an intro to R from the Excel perspective, see this brilliant blog here. Here’s a list onto get us started. If you can think of any more, please give me a shout and I will update it. It’s just an overview and it’s to help the novice get started on a path of self-guided research into both of these fascinating topics.

R SQL / BI background
A Factor has special properties; it can represent a categorical variable, which are used in linear regression, ANOVA etc. It can also be used for grouping. A Dimension is a way of describing categorical variables. We see this in the Microsoft Business Intelligence stack.
in R, dim means that we can give a chunk of data dimensions, or, in other words, give it a size. You could use dim to turn a list into a matrix, for example Following Kimball methodology, we tend to prefix tables as dim if they are dimension tables. Here, we mean ‘dimensions’ in the Kimball sense, where a ‘dimension’ is a way of describing data. If you take a report title, such as Sales by geography, then ‘geography’ would be your dimension.
R memory management can be confusing. Read Matthew Keller’s excellent post here. If you use R to look at large data sets, you’ll need to know
– how much memory an object is taking;
– 32-bit R vs 64-bit R;
– packages designed to store objects on disk, not RAM;
– gc() for memory garbage collection
– reduce memory fragmentation.
SQL Server 2016 CTP3 brings native In-database support for the open source R language. You can call both R, RevoScaleR functions and scripts directly from within a SQL query. This circumvents the R memory issue because SQL Server benefits the user, by introducing multi-threaded and multi-core in-DB computations
Data frame is a way of storing data in tables. It is a tightly coupled collections of variables arranged in rows and columns. It is a fundamental data structure in R. In SQL SSRS, we would call this a data set. In T-SQL, it’s just a table. The data is formatted into rows and columns, with mixed data types.
All columns in a matrix must have the same mode(numeric, character, and so on) and the same length. A matrix in SSRS is a way of displaying, grouping and summarizing data. It acts like a pivot table in Excel.
 <tablename>$<columnname> is one way you can call a table with specific reference to a column name.  <tablename>.<columname> is how we do it in SQL, or you could just call the column name on its own.
To print something, type in the variable name at the command prompt. Note, you can only print items one at a time, so use cat to combine multiple items to print out. Alternatively, use the print function. One magic feature of R is that it knows magically how to format any R value for printing e.g.

print(matrix(c(1,2,3,5),2,2))

PRINT returns a user-defined message to the client. See the BOL entry here. https://msdn.microsoft.com/en-us/library/ms176047.aspx

CONCAT returns a string that is the result of concatenating two or more string values. https://msdn.microsoft.com/en-GB/library/hh231515.aspx

Variables allow you to store data temporarily during the execution of code. If you define it at the command prompt, the variable is contained in your workspace. It is held in memory, but it can be saved to disk. In R, variables are dynamically typed so you can chop and change the type as you see fit. Variables are declared in the body of a batch or procedure with the DECLARE statement and are assigned values by using either a SET or SELECT statement. Variables are not dynamically typed, unlike R. For in-depth look at variables, see Itzik Ben-Gan’s article here.
ls allows you to list the variables and functions in your workspace. you can use ls.str to list out some additional information about each variable. SQL Server has tables, not arrays. It works differently, and you can find a great explanation over at Erland Sommarskog’s blog. For SQL Server 2016 specific information, please visit the Microsoft site.
A Vector is a key data structure in R, which has tons of flexibility and extras. Vectors can’t have a mix of data types, and they are created using the c(…) operator. If it is a vector of vectors, R makes them into a single vector. Batch-mode execution is sometimes known as vector-based or vectorized execution. It is a query processing method in which queries process multiple rows together. A popular item in SQL Server 2016 is Columnstore Indexes, which uses batch-mode execution. To dig into more detail, I’d recommend Niko Neugebauer’s excellent blog series here, or the Microsoft summary.

There will be plenty of other examples, but I hope that helps for now.

Learning pathway for SQL Server 2016 and R Part 1: Installation and configuration

Jargogled is an archaic word for getting confused or mixed up. I’m aware that there are lots of SQL Server folks out there, who are desperate to learn R, but might be jaRgogled by R. Now that R is in SQL Server, it seems like the perfect opportunity to start a new blog series to help people to splice the two great technologies together. So here you are!

First up, what do you need to know about SQL Server installation with R? The installation sequence is well documented here. However, if you want to make sure that the R piece is installed, then you will need to make sure that you do one thing: tick the Advanced Analytics Extension box.

SQL Server 2016 R Feature Selection

You need to select ‘Advanced Analytics Extensions’, which you will find under ‘Instance Features’. Once you’ve done that, you are good to proceed with the rest of your installation.

Once SQL Server is installed, let’s get some data into a SQL Server database. Firstly, you’ll need to create a test database, if you don’t have one already. You can find some information on database creation in SQL Server 2016 over at this Microsoft blog. You can import some data very quickly and there are different ways of importing data. If you need more information on this, please read this Microsoft blog.

If you fancy taking some sample data, try out the UCI Machine Learning data repository. You can download some data from there, following the instructions on that site, and then pop it into SQL Server.

If you have Office x64 installed on your machine, you might run into an issue:

Microsoft.ACE.OLEDB.15.0′ provider is not registered on the local machine

I ran into this issue when I tried to import some data into SQL Server using the quick and dirty ‘import data’ menu item in SSMS. After some fishing around, I got rid of it by doing the following:

There are other ways of importing data, of course, but I wanted to play with R and SQL Server, and not spend a whole chunk of time importing data.

In our next tutorial, we will look at some of the vocabulary for R and SQL Server which can look confusing for people from both disciplines. Once you learn the terminology, then you’ll see that you already know a lot of the concepts in R from your SQL Server and Business Intelligence expertise. That expertise will help you to springboard to R expertise, which is great for your career.

Convincing your HiPPO at EARL Conference in London!

ID-100257254I’m delighted to be speaking at the EARL Conference to be held in London on the 14th – 16th September. What’s my topic?

Convince your HiPPO with Real world Data Storytelling in R and Machine Learning

In a world where the HiPPO’s (Highest Paid Person’s Opinion) is final, how can we use technology to drive the organisation towards data-driven decision making as part of their organizational DNA? R provides a range of functionality in machine learning, but we need to expose its richness in a world where it is made accessible to decision makers. Using Data Storytelling with R, we can imprint data in the culture of the organization by making it easily accessible to everyone, including decision makers. Together, the insights and process of machine learning are combined with data visualisation to help organisations derive value and insights from big and little data.

In this session, we will use R and cloud-based technologies in order to explore and analyse data using machine learning and statistical packages functionality, and we will look at our results. Then, we will look at how we disseminate the results to the HiPPO audience, using best practices in data visualisation and R, informed by gurus such as Stephen Few and Edward Tufte.

If you want to know how to demystify R and the insights you’ve found during your analyses, join this session in order to learn about machine learning as a technology and a discipline, and how to make the most of your insights using best practice data visualisation. Using real-life scenarios, this session will help you to communicate the insights of your data to your HiPPO, thereby helping to move your organisation towards a data-driven culture.

R now ranks as the sixth most popular programming language  – its move from last year’s 9th place reflecting the growing importance of data analytics to an increasing number of industries and sectors.  EARL offers a unique opportunity to discover how R is being used commercially to provide a wealth of business solutions.

EARL London will feature :

  • Presentations from over 40 R gurus and Business leaders
  • Speakers represent a broad range of industry sectors  – including:  insurance, manufacturing, customer analytics, life sciences, finance etc
  • Sessions include:  Data Visualisation, Business Challenges, Big Data Technologies, Modelling, Workflow and Commercial Applications
  • Keynote speakers:  Alex Bellow, Dirk Eddelbuettel, Joe Cheng and Hannah Fry
  • Speakers representing Companies such as Shell, KPMG, AstraZeneca, Lloyd’s of London, UBS and Hewlett Packard
  • Pre Conference workshops on:  Interactive Reporting with R Markdown and Shiny (now SOLD OUT), An Introduction to Rcpp,  Integrating R and Python, and Current Best Practices in Formal Package Development
  • Sponsors and Exhibitors including Revolution Analytics, RStudio, Hewlett Packard, Teradata, Oracle, Harnham UK, Plotly, Tessella, Information Builders and O’Reilly
  • Sensational central London venue
  • 3 Conference Networking events including the Main Conference Reception in the amazing walkways of London’s iconic Tower Bridge

If you have yet to purchase your ticket please don’t delay to avoid disappointment. Tickets can be purchased online via credit card or can be invoiced if required. Group discounts are available to companies sending >5 attendees – please email earl-team@mango-solutions.com for more information.

Jen’s Diary: What does Microsoft’s recent acquisitions of Revolution Analytics mean for PASS?

Caveat: This blog does not represent the views of PASS or the PASS Board. These opinions are solely mine.

The world of data and analytics keeps heating up. Tableau, for example, keeps growing and winning. In fact, Tableau continues to grow total and licence revenue 75% year over year, with its total revenue grew to $142.9 million in the FY4 of 2014.There’s a huge shift in the market towards analytics, and it shows in the numbers. Lets take a look at some of the interesting things Microsoft have done recently, and see how it relates to PASS:

  • Acquired Revolution Analytics, an R-language-focused advanced analytics firm, will bring customers tools for prediction and big-data analytics.
  • Acquired Datazen, a provider of data visualization and key performance indicator data on Windows, iOS and Android devices. This is great from the cross-platform perspective, and we’ll look at this in a later blog. For now, let’s discuss Revolution and Microsoft.

Why it was good for Microsoft to acquire Revolution Analytics

The acquisition shows that Microsoft is bolstering its portfolio of advanced analytics tools. R is becoming increasingly common as a skill set, and businesses are more comfortable about using open source technology such as R. It is also accessible software, and a great tool for doing analytics. I’m hoping that this will help organisations to recognise and conduct advanced analytics, and it will improve the analytics capability in HDInsight.

Microsoft has got pockets of advanced analytics capabilities built into Microsoft SQL Server, and in particular, SQL Server Analysis Services, and also in the SQL Server Parallel Data Warehouse (PDW). Microsoft also has the Azure Machine Learning Service (Azure ML) which uses R in MLStudio. However, it does not have an advanced analytics studio, and the approach can come across as piecemeal for those who are new to it. The acquisition of Revolution Analytics will give Microsoft on-premises tools for data scientists, data miners, and analysts, and cloud and big data analytics for the same crowd.

Here’s what I’d like Microsoft to do with R:

  • Please give some love to SSRS by infusing it with R. There is a codeplex download that will help you to produce R visualisations in SSRS. I’d like to see more and easier integration, which doesn’t require a lot of hacking about.
  • Power Query has limited statistical capability at the moment. It could be expanded to include R. I am not keen for Microsoft to develop yet another programming language and R could be a part of the Power Query story.
  • Self-service analytics. We’ve all seen the self-service business intelligence communications. What about helping people to self-serve analytics as well, once they’ve cracked self-service BI? I’d like to see R made easier to use for everyone. I sense that will be a long way off, but it is an opportunity.
  • Please change the R facility in MLStudio. It’s better to use RStudio to create your R script, then upload it.

What issues do I see in the Revolution Analytics acquisition?

Microsoft is a huge organisation. Where will it sit within the organisation? Any acquisition involves a change management process. Change management is always hard. R touches different parts of the technology stack. This could be further impacted by the open source model that R has been developed under. Fortunately Revolution seem to have thought of some of these issues already: how does it scale, for example? This acquisition will need to be carefully envisioned, communicated and implemented, and I really do wish them every success with it.

What does this mean for PASS?

I hold the PASS Business Analytics Portfolio, and our PASS Business Analytics Conference is being held next week. Please use code BFFJS to get the conference for a discount rate, if you are interested in going.

I think the PASS strategy of becoming more data platform focused is the right one. PASS exist to provide technical community education to data professionals, and I think PASS are well placed to move on the analytics journey that we see in the industry. I already held a series on R for the Data Science Virtual Chapter, and I’m confident you’ll see more material on this and related topics. There are sessions on R at the PASS BA Conference as well. The addition of Revolution Analytics and Datazen is great for Microsoft, and it means that the need for learning in these areas is more urgent, not less. That does not mean that i think that everyone should learn analytics. I don’t. However, I do think PASS can help those who are part of the journey, if they want (or need) to be.

I’m personally glad PASS are doing the PASS Business Analytics Conference because I believe it is a step in the right direction, in the analytics journey we see for the people who want to learn analytics, the businesses who want to use it, and the burgeoning technology. I agree with Brent Ozar ( b / t ) in that I don’t think that the role of the DBA is going away. I do think that, for small / medium businesses, some folks might find that they become the ‘data’ person rather than the DBA being a skill on its own. I envisage that PASS will continue to serve the DBA-specialist-guru as well as the BI-to-analytics people, as well as those who become the ‘one-stop-shop’ for everything data in their small organisation (DBA / BA / Analytics), as well as the DBA-and-Cloud person. It’s about giving people opportunity to learn what they want and need to learn, in order to keep up with the rate of change we see in the industry.

Please feel free to comment below.

Your friend,

Jen Stirrup

x

Visualising Data with R: HTMLWidgets for R

I’m interested in analyzing and visualizing data with R, as you know. Thanks to Jen Underwood, she pointed me at this cool series of widgets, called HTMLWidgets, that allows you to do some interesting visualisations with R. I must admit I haven’t touched JavaScript for years, ever since I helped to build websites for well known entertainment sites, but I’m glad I can pick it up again. What does it let you do?

  • Use JavaScript visualization libraries at the R console, just like plots
  • Embed widgets in R Markdown documents and Shiny web applications
  • Develop new widgets using a framework that seamlessly bridges R and JavaScript

Why don’t you head on over to the site and have a look? I’ll be playing with it, and finally finishing my R blog series, as soon as I can. I’ve been busy helping to organize the PASS BA Conference, which is taking place in April 2015 in Santa Clara. Therefore, I hope you’ll excuse the radio silence. It’s on my to-do list, which never goes down, ever!