Fun DataDive with DataKind UK

This weekend, I volunteered with DataKind UK on their Summer DataDive, which took place on the weekend of 28th and 29th July 2018 in the Pivotal London offices in Shoreditch. I had a fantastic, memorable weekend, mixing with around 200 other data scientists.

I’d like to thank the DataKind team for being so inspirational, giving, and kind with their time and skills. I’d like to emphasise my absolute admiration for the Data Ambassadors and the work that they do to lift everyone up.

Why did I do this? DataKind appealed to me since it meant that I could sharpen my data science skills  by pitching in with experts. New learners to Data Science are welcome, and there were also newbies who had some experience of data and wanted to know more. There was room for everyone to contribute, so if you are a newbie, it would be a great way to join in the conversations and learn from experts who love what they can achieve with data. Plus, it’s a great opportunity to mix with real data scientists. This isn’t Poundland data science, and this is not pseudo Data Science. This is the real thing; and I spent two days immersed in real problems using Data Science as a solution. I learned a lot, and I contributed as well. There is a saying that you are the average of your friends, and I needed to get close to more Data Scientists so that I could build on my earlier experience on AI and bring it up-to-date.

I wanted to help a charity, by dedicating my time and skills, to support women and girls who need it. I understand that there are vulnerable men too; but this isn’t about whataboutism. Women and girls are disproportionally affected by issues such as domestic violence and being the victims of sexual crimes, and I wanted to do something practical to help.

Lancashire Women's Centres LogoFor my specific contribution, I was working with a team of 25 other data scientists, we worked on finding insights in data belonging to Lancashire Women’s Centre. The vision of Lancashire Women’s Centre is that all women and girls in Lancashire are valued and treated as equals. Their aim is to empower women and girls to be able to transform their lives by bringing them together to find their voice, share experiences and understanding, develop their knowledge and skills, challenge stereotypes and misconceptions about them so that they can have choices in becoming the individuals they want to be. I share this conviction deeply and I wanted to help.

You may well be thinking that the charity help a small number of women, but that’s not the case at all. They have a real impact in their community. The Lancashire Women’s Centre has helped over 3000 women in the last year. This includes 5807 hours of therapeutic support were accessed by 1154 women and 78 men.  Following therapy: 25% were no longer taking medication, 8% felt the support had helped them find and keep a job, 12% continued to access LWC services to support their recovery.

So what did I do? I can’t share specific details because the data is confidential, and it obviously impacts some of the UK’s most vulnerable women and girls. I will say that the tools used were CoCalc, R, Python, Excel and Tableau and Power BI to work with the data.

DataKind™ brings high-impact organizations dedicated to solving the world’s biggest challenges together with leading data scientists to improve the quality of, access to and understanding of data in the social sector. This leads to better decision-making and greater social impact. Launched in 2011, DataKind leads a community of passionate data scientists, visionary partners and mission-driven organizations with the talent, commitment and energy to use data science in the service of humanity. DataKind is headquartered in New York City and has Chapters in Bangalore, Dublin, San Francisco, Singapore, the UK and Washington DC. More information on DataKind, our programs and our partners can be found on their website: www.datakind.org

Lancashire Women’s Centre

DataKind JenStirrup and Team

I’m the one on the right, wearing orange!

I’m looking forward to the next one!

Issues and Resolutions in starting R and R Server on SQL Server 2017

I am helping some people learn Data Science and we are having a ton of fun! There are lots of things to remember. So I am noting things here, in case I forget!

We noted the following error message, when we saw that R was not running on our SQL Server 2017 install:

‘sp_execute_external_script’ is disabled on this instance of SQL Server. Use sp_configure ‘external scripts enabled’ to enable it.

Here is the longer version:

Msg 39023, Level 16, State 1, Procedure sp_execute_external_script, Line 1 [Batch Start Line 3]

‘sp_execute_external_script’ is disabled on this instance of SQL Server. Use sp_configure ‘external scripts enabled’ to enable it.

Msg 11536, Level 16, State 1, Line 4

EXECUTE statement failed because its WITH RESULT SETS clause specified 1 result set(s), but the statement only sent 0 result set(s) at run time.

Grr! What’s happened here? We had installed R as part of the SQL installation, and we had run the command to enable it, too. In case you are wondering, here is the command:

EXEC sp_configure ‘external scripts enabled’, 1
RECONFIGURE WITH OVERRIDE

So what happens next? Initial things to check:

Is R Server installed properly along with SQL Server? Here are some guidelines to help you.

Is the Launchpad service running? One of my colleagues and friends Tomaž Kaštrun  wrote a nice article on SQL Server Central. If not, this could be due to a lack of permissions in being able to start the service.

Did you restart the MSSQL Service on the machine? This will also restart the Launchpad service as well. If you didn’t restart the service, you will need to do that so it can pick up the results.

Once R is running properly, you can check it by using the following command, borrowed from the official installation guide over at Microsoft:

EXEC sp_execute_external_script @language =N’R’,
@script=N’
OutputDataSet <- InputDataSet;
‘,
@input_data_1 =N’SELECT 1 AS RIsWorkingFine’
WITH RESULT SETS (([RIsWorkingFine] int not null));
GO

If that returns a 1, then you are all set! To prove it works properly, you can retrieve the world famous Iris dataset using the following command, borrowed from the official documentation on sp_execute_external_script:

DROP PROC IF EXISTS get_iris_dataset;

go

CREATE PROC get_iris_dataset

AS BEGIN

EXEC sp_execute_external_script @language = N‘R’ , @script = N‘iris_data <- iris;’ , @input_data_1 = N , @output_data_1_name = N‘iris_data’ WITH RESULT SETS ((“Sepal.Length” float not null, “Sepal.Width” float not null, “Petal.Length” float not null, “Petal.Width” float not null, “Species” varchar(100)));

END;

GO

Once you’ve created the command, execute the following SQL command and you will see the iris dataset:

exec get_iris_dataset

You’re all set! Enjoy R!

Five reasons to be excited about Microsoft Data Insights Summit!

ms-datainsights_linkedin-1200x627-1

I’m delighted to be speaking at Microsoft Data Summit! I’m pumped about my session, which focuses on Power BI for the CEO. I’m also super happy to be attending the Microsoft Data Summit for five top reasons (and others, but five is a nice number!). I’m excited about all of the Excel, Power BI, DAX and Data Science goodies. Here are some sample session titles:

Live Data Streaming in Power BI

Data Science for Analysts

What’s new in Excel

Embed R in Power BI

Spreadsheet Management and Compliance (It is a topic that keeps me up at night!)

Book an in-person appointment with a Microsoft expert with the online Schedule Builder. Bring your hard – or easy – questions! In itself, this is a real chance to speak to Microsoft directly and get expert, indepth  help from the team who make the software that you love.

Steven Levitt of Freakonomics is speaking and I’m delighted to hear him again. I’ve heard him present recently and he was very funny whilst also being insightful. I think you’ll enjoy his session. You’ll know him from Freakonomics.

freakonomics

I’m excited that James Phillips is delivering a keynote! I have had the pleasure of meeting him a few times and I am really excited about where James and the Power BI team have taken Power BI. I’m sure that there will be good things as they steam ahead, so James’ keynote is unmissable!

Alberto Cairo is presenting a keynote! Someone who always makes me sit up a bit straighter when they tweet is Alberto Cairo, and I’m delighted he’s attending. I hope I can get to meet him in person. Whether Alberto is tweeting about data visualisation, design or the world in general, it’s always insightful. I have his latest book and I hope I can ask him to sign it.

003b6f66

Tons of other great speakers! Now someone I haven’t seen for ages – too long in fact – is Rob Collie. Rob is President of PowerPivotPro and you simply have to hear him speak on the topic. He’s direct in explaining how things work, and you will learn from him. I’m glad to see Marco Russo is speaking and I love his sessions. In fact, at TechEd North America, I only got to see one session because I was so busy with presenting, booth duty etc… but I managed to get to see a session and I made sure it was Marco Russo and Alberto Ferrari’s session.  Chris Webb is also presenting and his sessions are always amazing. I have to credit Chris in part for where I am today, because his blog kept me sane and his generosity during sessions meant that I never felt stupid asking him questions. I’m learning too – always.

Ok, that’s five things but there are plenty more. Why not see for yourself?

Join me at the conference, June 12–13, 2017 in Seattle, WA — and be sure to sign up for your 1:1 session with a Microsoft expert.

Learning pathway for SQL Server 2016 and R Part 1: Installation and configuration

Jargogled is an archaic word for getting confused or mixed up. I’m aware that there are lots of SQL Server folks out there, who are desperate to learn R, but might be jaRgogled by R. Now that R is in SQL Server, it seems like the perfect opportunity to start a new blog series to help people to splice the two great technologies together. So here you are!

First up, what do you need to know about SQL Server installation with R? The installation sequence is well documented here. However, if you want to make sure that the R piece is installed, then you will need to make sure that you do one thing: tick the Advanced Analytics Extension box.

SQL Server 2016 R Feature Selection

You need to select ‘Advanced Analytics Extensions’, which you will find under ‘Instance Features’. Once you’ve done that, you are good to proceed with the rest of your installation.

Once SQL Server is installed, let’s get some data into a SQL Server database. Firstly, you’ll need to create a test database, if you don’t have one already. You can find some information on database creation in SQL Server 2016 over at this Microsoft blog. You can import some data very quickly and there are different ways of importing data. If you need more information on this, please read this Microsoft blog.

If you fancy taking some sample data, try out the UCI Machine Learning data repository. You can download some data from there, following the instructions on that site, and then pop it into SQL Server.

If you have Office x64 installed on your machine, you might run into an issue:

Microsoft.ACE.OLEDB.15.0′ provider is not registered on the local machine

I ran into this issue when I tried to import some data into SQL Server using the quick and dirty ‘import data’ menu item in SSMS. After some fishing around, I got rid of it by doing the following:

There are other ways of importing data, of course, but I wanted to play with R and SQL Server, and not spend a whole chunk of time importing data.

In our next tutorial, we will look at some of the vocabulary for R and SQL Server which can look confusing for people from both disciplines. Once you learn the terminology, then you’ll see that you already know a lot of the concepts in R from your SQL Server and Business Intelligence expertise. That expertise will help you to springboard to R expertise, which is great for your career.

Jen’s Diary: What does Microsoft’s recent acquisitions of Revolution Analytics mean for PASS?

Caveat: This blog does not represent the views of PASS or the PASS Board. These opinions are solely mine.

The world of data and analytics keeps heating up. Tableau, for example, keeps growing and winning. In fact, Tableau continues to grow total and licence revenue 75% year over year, with its total revenue grew to $142.9 million in the FY4 of 2014.There’s a huge shift in the market towards analytics, and it shows in the numbers. Lets take a look at some of the interesting things Microsoft have done recently, and see how it relates to PASS:

  • Acquired Revolution Analytics, an R-language-focused advanced analytics firm, will bring customers tools for prediction and big-data analytics.
  • Acquired Datazen, a provider of data visualization and key performance indicator data on Windows, iOS and Android devices. This is great from the cross-platform perspective, and we’ll look at this in a later blog. For now, let’s discuss Revolution and Microsoft.

Why it was good for Microsoft to acquire Revolution Analytics

The acquisition shows that Microsoft is bolstering its portfolio of advanced analytics tools. R is becoming increasingly common as a skill set, and businesses are more comfortable about using open source technology such as R. It is also accessible software, and a great tool for doing analytics. I’m hoping that this will help organisations to recognise and conduct advanced analytics, and it will improve the analytics capability in HDInsight.

Microsoft has got pockets of advanced analytics capabilities built into Microsoft SQL Server, and in particular, SQL Server Analysis Services, and also in the SQL Server Parallel Data Warehouse (PDW). Microsoft also has the Azure Machine Learning Service (Azure ML) which uses R in MLStudio. However, it does not have an advanced analytics studio, and the approach can come across as piecemeal for those who are new to it. The acquisition of Revolution Analytics will give Microsoft on-premises tools for data scientists, data miners, and analysts, and cloud and big data analytics for the same crowd.

Here’s what I’d like Microsoft to do with R:

  • Please give some love to SSRS by infusing it with R. There is a codeplex download that will help you to produce R visualisations in SSRS. I’d like to see more and easier integration, which doesn’t require a lot of hacking about.
  • Power Query has limited statistical capability at the moment. It could be expanded to include R. I am not keen for Microsoft to develop yet another programming language and R could be a part of the Power Query story.
  • Self-service analytics. We’ve all seen the self-service business intelligence communications. What about helping people to self-serve analytics as well, once they’ve cracked self-service BI? I’d like to see R made easier to use for everyone. I sense that will be a long way off, but it is an opportunity.
  • Please change the R facility in MLStudio. It’s better to use RStudio to create your R script, then upload it.

What issues do I see in the Revolution Analytics acquisition?

Microsoft is a huge organisation. Where will it sit within the organisation? Any acquisition involves a change management process. Change management is always hard. R touches different parts of the technology stack. This could be further impacted by the open source model that R has been developed under. Fortunately Revolution seem to have thought of some of these issues already: how does it scale, for example? This acquisition will need to be carefully envisioned, communicated and implemented, and I really do wish them every success with it.

What does this mean for PASS?

I hold the PASS Business Analytics Portfolio, and our PASS Business Analytics Conference is being held next week. Please use code BFFJS to get the conference for a discount rate, if you are interested in going.

I think the PASS strategy of becoming more data platform focused is the right one. PASS exist to provide technical community education to data professionals, and I think PASS are well placed to move on the analytics journey that we see in the industry. I already held a series on R for the Data Science Virtual Chapter, and I’m confident you’ll see more material on this and related topics. There are sessions on R at the PASS BA Conference as well. The addition of Revolution Analytics and Datazen is great for Microsoft, and it means that the need for learning in these areas is more urgent, not less. That does not mean that i think that everyone should learn analytics. I don’t. However, I do think PASS can help those who are part of the journey, if they want (or need) to be.

I’m personally glad PASS are doing the PASS Business Analytics Conference because I believe it is a step in the right direction, in the analytics journey we see for the people who want to learn analytics, the businesses who want to use it, and the burgeoning technology. I agree with Brent Ozar ( b / t ) in that I don’t think that the role of the DBA is going away. I do think that, for small / medium businesses, some folks might find that they become the ‘data’ person rather than the DBA being a skill on its own. I envisage that PASS will continue to serve the DBA-specialist-guru as well as the BI-to-analytics people, as well as those who become the ‘one-stop-shop’ for everything data in their small organisation (DBA / BA / Analytics), as well as the DBA-and-Cloud person. It’s about giving people opportunity to learn what they want and need to learn, in order to keep up with the rate of change we see in the industry.

Please feel free to comment below.

Your friend,

Jen Stirrup

x

The Data Analysts Toolkit Day 4: Getting to know R a bit better

So what actually is R? Let’s take a look at running some commands, and we can talk about it as we go along.

R is a powerful environment for statistical computing
It is like a calculator and its power is: 
… it lets you save results in variables. Let’s take a look:
> x
> y
> z = 4
> x + y + z
Ok, so we can run little commands and it gives us numbers back. What’s all the fuss about?
We can save data in files. They are saved as .RData files. R lets us import and export data in lots of different ways. If it sounds fun, it is because it is! Let’s take a look.
> someData
> save(someData, file = “D:/DataAnalystsToolkit/someData.Rdata”) 
> rm(someData) 
> load(“D:/DataAnalystsToolkit/someData.Rdata”) 
> print(someData)
Well, that is great if our data is in RData format. But normally we need to import data from different sources.
Import from CSV File
Read.csv()
MyDataFrame

print(MyDataFrame)
Alternatively we can load data in directly from RStudio. 
Go to Tools in RStudio, and select Import Dataset. 
Select the file CountryCodes.csv and select the Import button.
In RStudio, you will now see the data in the data pane.
The console window will show the following:
> #import dataset
> CountryCodes <- directory="" files="" header="F)
>   View(CountryCodes)
Once the data is imported, we can check the data.
dim(CountryCodes)
head(CountryCodes)
tail(CountryCodes) 
So how do we connect R to SQL Server?
The Package RODBC provides R with a connection to ODBC databases:
> library(RODBC) 
> myodbcConnect
myQuery

# or read query from file 
# myQuery <- ataanalyststoolkit="" myquery.sql="" nchars="99999) 
myData myodbcConnect

, myQuery, errors=TRUE) 

odbcCloseAll()
I hope that the rather bizarre colour coding will help you to see how the variables add up together to create a query.
RODBC also works for importing data from Excel files:
library(RODBC) 
filename
myxlsFile <- filename="" odbcconnectexcel="" readonly="FALSE) 
sqlSave(myxlsFile, a, rownames = FALSE) 
b
odbcCloseAll()
Next, we will look in more detail how R and PowerBI complement one another.