Five reasons to be excited about Microsoft Data Insights Summit!

ms-datainsights_linkedin-1200x627-1

I’m delighted to be speaking at Microsoft Data Summit! I’m pumped about my session, which focuses on Power BI for the CEO. I’m also super happy to be attending the Microsoft Data Summit for five top reasons (and others, but five is a nice number!). I’m excited about all of the Excel, Power BI, DAX and Data Science goodies. Here are some sample session titles:

Live Data Streaming in Power BI

Data Science for Analysts

What’s new in Excel

Embed R in Power BI

Spreadsheet Management and Compliance (It is a topic that keeps me up at night!)

Book an in-person appointment with a Microsoft expert with the online Schedule Builder. Bring your hard – or easy – questions! In itself, this is a real chance to speak to Microsoft directly and get expert, indepth  help from the team who make the software that you love.

Steven Levitt of Freakonomics is speaking and I’m delighted to hear him again. I’ve heard him present recently and he was very funny whilst also being insightful. I think you’ll enjoy his session. You’ll know him from Freakonomics.

freakonomics

I’m excited that James Phillips is delivering a keynote! I have had the pleasure of meeting him a few times and I am really excited about where James and the Power BI team have taken Power BI. I’m sure that there will be good things as they steam ahead, so James’ keynote is unmissable!

Alberto Cairo is presenting a keynote! Someone who always makes me sit up a bit straighter when they tweet is Alberto Cairo, and I’m delighted he’s attending. I hope I can get to meet him in person. Whether Alberto is tweeting about data visualisation, design or the world in general, it’s always insightful. I have his latest book and I hope I can ask him to sign it.

003b6f66

Tons of other great speakers! Now someone I haven’t seen for ages – too long in fact – is Rob Collie. Rob is President of PowerPivotPro and you simply have to hear him speak on the topic. He’s direct in explaining how things work, and you will learn from him. I’m glad to see Marco Russo is speaking and I love his sessions. In fact, at TechEd North America, I only got to see one session because I was so busy with presenting, booth duty etc… but I managed to get to see a session and I made sure it was Marco Russo and Alberto Ferrari’s session.  Chris Webb is also presenting and his sessions are always amazing. I have to credit Chris in part for where I am today, because his blog kept me sane and his generosity during sessions meant that I never felt stupid asking him questions. I’m learning too – always.

Ok, that’s five things but there are plenty more. Why not see for yourself?

Join me at the conference, June 12–13, 2017 in Seattle, WA — and be sure to sign up for your 1:1 session with a Microsoft expert.

Data Preparation in AzureML – Where and how?

messy-officeOne question that keeps popping up in  myc customer AzureML projects is ‘How do I conduct data preparation on my data?’ For example, how can we join the data, clean it, and shape it so that it is ready for analytics? Messy data is a problem for every organisation. If you don’t think it is an issue for your organisation, perhaps you haven’t looked hard enough.

To answer the question properly, we need to stand back a little, and see the problem as a part of a larger technology canvas. From the enterprise architecture perspective, that it is best to do data preparation as close to the source as possible. The reason for this is that the cleaned data would act as a good, consistent source for other systems, and you would only have to do it once. You have cleaned data that you can re-use, rather than re-do for every place where you need to use the data.

Let’s say you have a data source, and you want to expose the data in different technologies, such as Power BI, Excel and Tableau. Many organisations have a ‘cottage industry’ style of enterprise architecture, where they have different departments using different technologies. It is difficult to align data and analytics across the business, since the interpretation of the data may be implemented in a manner that is technology-specific rather than business-focused. If you take a ‘cottage industry’ approach, you would have to repeat your data preparation steps across different technologies.

dt960131dhc0

When we come to AzureML, the data preparation perspective isn’t forgotten, but it isn’t a strong data preparation tool like Paxata or Datameer, for example. It’s the democratization of data for the masses, yes, and I see the value it brings to businesses. It’s meant for machine learning and data science, so you should expect to use it for those purposes. It’s not a standalone data preparation tool, although it does help you partway.

The data preparation facilities in AzureML can be found here. If you have to clean up the data in AzureML, my futurology ‘dream’ scenario for AzureML is that Microsoft have weighty data preparation as a task, like other tasks in AzureML. You could click on the task, and then have roll-your-own data preparation pop up in the browser (all browser based) provided by Microsoft or perhaps have Paxata or Datameer pop out as a service, hosted in Azure as part of your Azure portal services. Then, you would go back to AzureML, all in the browser. In the meantime, you would be better trying to follow the principles of cleaning it up close to the course.

crisp-dm_process_diagramDon’t be downhearted if AzureML isn’t giving you the data preparation that you need. Look back to the underlying data, and see what you can do. The answer might be as simple as writing a view in SQL Server. AzureML is for operations and machine learning further downstream. If you are having serious data preparation issues, then perhaps you are not ready for the modelling phase of CRISP-DM so you may want to take some time to think about those issues.