Data Preparation in AzureML – Where and how?

messy-officeOne question that keeps popping up in  myc customer AzureML projects is ‘How do I conduct data preparation on my data?’ For example, how can we join the data, clean it, and shape it so that it is ready for analytics? Messy data is a problem for every organisation. If you don’t think it is an issue for your organisation, perhaps you haven’t looked hard enough.

To answer the question properly, we need to stand back a little, and see the problem as a part of a larger technology canvas. From the enterprise architecture perspective, that it is best to do data preparation as close to the source as possible. The reason for this is that the cleaned data would act as a good, consistent source for other systems, and you would only have to do it once. You have cleaned data that you can re-use, rather than re-do for every place where you need to use the data.

Let’s say you have a data source, and you want to expose the data in different technologies, such as Power BI, Excel and Tableau. Many organisations have a ‘cottage industry’ style of enterprise architecture, where they have different departments using different technologies. It is difficult to align data and analytics across the business, since the interpretation of the data may be implemented in a manner that is technology-specific rather than business-focused. If you take a ‘cottage industry’ approach, you would have to repeat your data preparation steps across different technologies.

dt960131dhc0

When we come to AzureML, the data preparation perspective isn’t forgotten, but it isn’t a strong data preparation tool like Paxata or Datameer, for example. It’s the democratization of data for the masses, yes, and I see the value it brings to businesses. It’s meant for machine learning and data science, so you should expect to use it for those purposes. It’s not a standalone data preparation tool, although it does help you partway.

The data preparation facilities in AzureML can be found here. If you have to clean up the data in AzureML, my futurology ‘dream’ scenario for AzureML is that Microsoft have weighty data preparation as a task, like other tasks in AzureML. You could click on the task, and then have roll-your-own data preparation pop up in the browser (all browser based) provided by Microsoft or perhaps have Paxata or Datameer pop out as a service, hosted in Azure as part of your Azure portal services. Then, you would go back to AzureML, all in the browser. In the meantime, you would be better trying to follow the principles of cleaning it up close to the course.

crisp-dm_process_diagramDon’t be downhearted if AzureML isn’t giving you the data preparation that you need. Look back to the underlying data, and see what you can do. The answer might be as simple as writing a view in SQL Server. AzureML is for operations and machine learning further downstream. If you are having serious data preparation issues, then perhaps you are not ready for the modelling phase of CRISP-DM so you may want to take some time to think about those issues.

Why UK Power BI Summit? Derive business value from your data

I’ve created UK Power BI Summit in response to an industry need for Power BI to have its own event, and I hope to produce a repeatable model for other Power BI groups globally. I am working with Microsoft in Redmond at the moment, in the hope that I can spread the world globally about the power of enabling businesses through data, via easily-accessible tools.

What’s the rationale? Personally, the next step in my career is to continue my trajectory from the data center towards boardroom level leadership and consultancy, in order to help organisations become 21st century, data-driven organisations. Data is at the foundation of businesses. Data, in turn, leads to insights and better decisions that improve the business. Ideally, businesses should have data as part of their DNA. This does not mean that there is not a place for context or for ‘gut instinct’. Data gives businesses new insights, and, in turn, it gives them new options.

tumblr_lxrqzlzskr1qdo62to1_500

My favourite bookshop in the world: The Strand Bookstore, New York, on the corner of Broadway and E 12th Street.

With my business and technical skills in mind, I am doing my MBA at this stage in my career to focus on building businesses as data-driven organisations. The MBA will help me to combine my technical and business expertise within an established framework that will help me to be more effective in a leadership role. I believe that the MBA will help me to articulate and achieve a strategic viewpoint, which, in turn, will help businesses to use their data more effectively.

I am not alone in this data-driven journey. My industry experience tells me that many organisations suffer from one thing: hype about the possibilities and opportunities in data, and, particularly Big Data, but they don’t know how to get started in terms of technology, people, and enabling business processes that would consume these services.

Organisations can find it difficult to know where to start, or even how to start. Very often, businesses simply store all of their data, rather than think proactively about the data that they have, and how they could use it. As businesses continue to get excited about the opportunities of Big Data, they will also need Data Thought Leadership in order to guide them effectively towards success.

Digital Transformation is a much bandied about term. It isn’t simply whacking a few Virtual Machines in Azure, moving data to the cloud and – yay – digital transformation. It’s about transforming the business through the use of technology, and it has the business at the front-and-center of the activity.

Now is the time for businesses to bring their data and their strategy together, using the latest technologies – but they can’t do that, until they see their data. This is where Power BI come in.

Processed with VSCOcam with hb2 preset

The Power BI event is aimed at those people in the organisation who are aware of business needs, user needs, and have winning ideas and who are willing to learn about user-oriented technology to make that happen. The event is aimed at helping these people to learn about the technology from beginner to advanced, according to their needs.

Although the event is about technology, it’s also about the business, and deriving business value from your data. It’s not a straightforward technology event. It’s about the business as well as the technology, and how it’s used. It’s about bringing you along the journey, further.

I thought that the difference between UK Power BI Summit and other events such as PASS SQLSaturday events, SQLBits were fairly clear, but it would seem from my email traffic that my assumption wasn’t correct.

Just to be clear:

  • I am not part of the SQLBits committee and I have nothing to do with their leadership. I don’t represent them and I’m not featured on their promotional video. I’ve been speaking there since SQLBits 7 through to SQLBits XV. You can look for my SQLBits 7 – 15 sessions here.
  • I am part of PASS and a non executive Director, and I sit on the PASS Board as an elected Director. I don’t represent PASS here. If you want a PASS-validated blog, then please head over to their site. This isn’t a PASS event.

Let’s look at the SQLBits mission statement, taken from their site:

SQL Bits was started by a group of individuals that are passionate about the SQL Server product suite. There is a breadth of knowledge in the SQL Community that will benefit everyone in the community. We want to spread that knowledge. We all work with the SQL community, some of us for many years and have all been given the MVP award by Microsoft.

Let’s look at the PASS Mission Statement, taken from their site:

PASS is an independent, not-for-profit organization run by and for the community. With a growing membership of more than 100K, PASS supports data professionals throughout the world who use the Microsoft data platform.

PASS strives to fulfill its mission by:

  • Facilitating member networking and the exchange of information through our local and virtual chapters, online events, local and regional events, and international conferences
  • Delivering high-quality, timely, technical content for in-depth learning and professional development

PASS was co-founded by CA Technologies and Microsoft Corporation in 1999 to promote and educate SQL Server users around the world. Since its founding, PASS has expanded globally and diversified its membership to embrace professionals using any Microsoft data technology.

So, the UK Power BI Summit is ultimately looking at using Power BI to transform businesses, through expertise in the technology, embedded in business-oriented discussions. The technology should support the business in its mission to adapt to the new world of data.

If you’d like to register, click below:

Eventbrite - UK Power BI Summit

How Microsoft can help join the Open Data dots

In one of the industry’s best known volte-face, Microsoft have warmly embraced Open Source. As announced at Microsoft’s Connect() conference earlier this week, Microsoft is pleased to become a Linux Foundation Platinum Member. We also saw a quieter announcement; the Azure DataMarket is being decommissioned.

Now that Microsoft are ecstatically adopting Open Source Software, I’d love to see Microsoft adopt Open Data too. I’d love to see an Open Data platform on Azure, which is easy-to-use, aimed at business users, data scientists and even consumers in the form of data citizens.

If you look up Open Data, you’ll see that there are open data ‘puddles’ everywhere. So we have the London Data Store, SF Open Data, and the Azure DataMarket already has some Open Data, for example, the UK Met Office Weather Open Data. Why not have all of these data puddles joined up in a new, Azure-based, Open Data store?

There is no joined up thinking in the world that would constitute an Open Data Lake. I’d love Microsoft to adopt this on behalf of, and for, the data community worldwide.

30293196663_f55039a144I’d like to see the Azure DataMarket rebooted to be a home for an Open Data platform. Perhaps it could be called Azure Open Data, or even simply Azure Open, or something simple like that.

Microsoft can join the Open Data dots for the community, and that’s a real democratization of data for us all.

 

 

 

 

 

Guess who is appearing in Joseph Sirosh’s PASS Keynote?

This girl! I am super excited and please allow me to have one little SQUUEEEEEEE! before I tell you what’s happening. Now, this is a lifetime achievement for me, and I cannot begin to tell you how absolutely and deeply honoured I am. I am still in shock!

I am working really hard on my demo and….. I am not going to tell you what it is. You’ll have to watch it. Ok, enough about me and all I’ll say is two things: it’s something that’s never been done at PASS Summit before and secondly, watch the keynote because there may be some discussion about….. I can’t tell you what… only that, it’s a must-watch, must-see, must do keynote event.

We are in a new world of Data and Joseph Sirosh and the team are leading the way. Watching the keynote will mean that you get the news as it happens, and it will help you to keep up with the changes. I do have some news about Dr David DeWitt’s Day Two keynote… so keep watching this space. Today I’d like to talk about the Day One keynote with the brilliant Joseph Sirosh, CVP of Microsoft’s Data Group.

Now, if you haven’t seen Joseph Sirosh present before, then you should. I’ve put some of his earlier sessions here and I recommend that you watch them.

Ignite Conference Session

MLDS Atlanta 2016 Keynote

I hear you asking… what am I doing in it? I’m keeping it a surprise! Well, if you read my earlier blog, you’ll know I transitioned from Artificial Intelligence into Business Intelligence and now I do a hybrid of AI and BI. As a Business Intelligence professional, my customers will ask me for advice when they can’t get the data that they want. Over the past few years, the ‘answer’ to their question has gone far, far beyond the usual on-premise SQL Server, Analysis Services, SSRS combo.

We are now in a new world of data. Join in the fun!

Customers sense that there is a new world of data. The ‘answer’ to the question Can you please help me with my data?‘ is complex, varied and it’s very much aimed at cost sensitivities, too. Often, customers struggle with data because they now have a Big Data problem, or a storage problem, or a data visualisation access problem. Azure is very neat because it can cope with all of these issues. Now, my projects are Business Intelligence and Business Analytics projects… but they are also ‘move data to the cloud’ projects in disguise, and that’s in response to the customer need. So if you are Business Intelligence professional, get enthusiastic about the cloud because it really empowers you with a new generation of exciting things you can do to please your users and data consumers.

As a BI or an analytics professional, cloud makes data more interesting and exciting. It means you can have a lot more data, in more shapes and sizes and access it in different ways. It also means that you can focus on what you are good at, and make your data estate even more interesting by augmenting it with cool features in Azure. For example, you could add in more exciting things such as Apache Tika library as a worker role in Azure to crack through PDFs and do interesting things with the data in there. If you bring it into SSIS, then you can tear it up and down again when you don’t need it.

I’d go as far as to say that, if you are in Business Intelligence at the moment, you will need to learn about cloud sooner or later. Eventually, you’re going to run into Big Data issues. Alternatively, your end consumers are going to want their data on a mobile device, and you will want easy solutions to deliver it to them. Customers are interested in analytics and the new world of data and you will need to hop on the Azure bus to be a part of it.

The truth is; Joseph Sirosh’s keynotes always contain amazing demos. (No pressure, Jen, no pressure….. ) Now, it’s important to note that these demos are not ‘smoke and mirrors’….

The future is here, now. You can have this technology too.

It doesn’t take much to get started, and it’s not too far removed from what you have in your organisation. AzureML and Power BI have literally hundreds of examples. I learned AzureML looking at the following book by Wee-Hyong Tok and others, so why not download a free book sample?

https://read.amazon.co.uk/kp/card?asin=B00MBL261W&preview=inline&linkCode=kpe&ref_=cm_sw_r_kb_dp_c54ayb2VHWST4

How do you proceed? Well, why not try a little homespun POC with some of your own data to learn about it, and then show your boss. I don’t know about you but I learn by breaking things, and I break things all the time when I’m  learning. You could download some Power BI workbooks, use the sample data and then try to recreate them, for example. Or, why not look at the community R Gallery and try to play with the scripts. you broke something? no problem! Just download a fresh copy and try again. You’ll get further next time.

I hope to see you at the PASS keynote! To register, click here: http://www.sqlpass.org/summit/2016/Sessions/Keynotes.aspx 

WPC Day One: Translating Digital Transformation into Solutions

I blogged over at my ‘official’ company blog about strategic considerations regarding Digital Transformation. There is a lot of messaging directed at sales, partners and CEO level conversations. For the techies, however, how does the strategy translate into a technical implementation that you can actually deliver, to facilitate Digital Transformation within the organisation? In other words, how do you make solutions that are sustainable and relevant?

Microsoft can help with modern, cloud-based tools and a cloud platform. Partners have the ability to use tools such as Office365, Power BI, Microsoft Flow and AzureML to reduce the integration cost and friction to deliver technical solutions. These partners can speak directly to the digital transformation, and lead it. These tools can form composable units or modules, which can be fitted together to meet business needs directly, thereby facilitating digital transformation.

What are these tools? During the WPC keynote, Ecolabs showed off their solution, which involved Power Bi and Microsoft Flow. Here is the example Microsoft Power BI Solution below:
WPC Day 1 Slides
Microsoft Flow is a new tool, which was used to create some of the workflows to align the productivity processes with the resulting dashboard.

What is Microsoft Flow? Well, it’s a great little app and I think you should take a look. Microsoft Flow allows you to create automated workflows between your business or consumer applications and services and connects them so that you get some action, such as notifications, synchronize files, collect data, and more actions that might be useful to your business.

Why is that useful for a Business Intelligence implementation? Well, it can help to track where your data is going. As someone who often goes into organisations where people have ‘lost’ data or it is hiding somewhere that the business people can’t get it, I see Microsoft Flow as a way forward for Digital Transformation in the business by facilitating the flow of data around the organisation.

You can even create workflows on your mobile device. Here is the Ecolabs example from WPC:
WPC Day 1 Slides
Basically, a Flow connects your web services, files, and cloud-based data to save time and effort for everyone, every day.

It’s good to see that Microsoft are a much more open organisation these days; I think that Microsoft Flow is evidence of the open attitude towards other companies, organisations and methodologies that are outside of the Microsoft corporate boundary. In particular, I am a huge fan of Wunderlist and they mentioned it yesterday during the Day One keynote. I know that Wunderlist have been bought by Microsoft and I hope that Wunderlist will appear in Office365 soon, such as in Outlook.

How does Flow work? Well, you start with a template, which gives you a great head start. Why not give it a blast? If it means you get to use Wunderlist as well for all of your lists, and start to love it, then you can thank me!

 

You could even use Microsoft Flow for new Github issues, and send a notification to Slack. Or perhaps you could use Flow so that you retain Dropbox as your file storage system, integrated with Office365. The examples are endless, I think.

All this shows that the cloud is a great enabler, and a platform, which partners and companies can use in order to make their organisations more productive and collaborative. These are simple examples, and I’m sure that you can think of more! The integrations all happen in the cloud, and it is one way that the cloud can be used as a tool for Digital Transformation.

Any questions, send me an email at hello@datarelish.com.

Kind Regards,

Jen Stirrup

JenStirrup

 

 

AzureCon round up: Intelligent Cloud, Applications, Data, Infrastructure, Business Agility and Cloud Ability

IAzureCon 1 organised an AzureCon viewing party tonight in Hertfordshire, with great team support from Team Awesome over at Cloudamour.

We watched a total of four keynotes, running back to back for almost four hours. The keynotes were all awesome and I’ve blogged some learnings here.AzureCon 3

First up, was Scott Guthrie (t), igniting the keynotes and kicking off the event with the journey to the intelligent cloud. I missed some of this piece because I was welcoming guests as they arrived, making introductions and so on. If you want to see the video, you can catch Scott Guthrie here on Channel 9. The thrust of Scott’s session was about cloud energising business and technical leaders worldwide turn the digital disruption into their advantage. Scott led customers who used cloud to enable their business to break new ground, and share their best practices in using some of the latest Microsoft innovations in enabling their journey to the cloud.

My personal favourite part of this piece was seeing the inspirational Lara Rubbelke (t) up on stage. Lara is inspirational and she’s generous with her time, supporting SQLFamily members. Lara explained the SQL Data Warehouse very clearly in terms of its simplicity to set up, and it’s relevance to the business. I liked her piece because she talked tech and business equally and that’s hard. It’s something I find that I have to do in my role every day; basically, wearing different hats, and it’s not easy to accomplish. Lara achieves this with ease and I recommend that you watch her segment, which is about 32 minutes into the video She also makes you think about how this could be relevant in your environment and that is an important takeaway.

In Lara’s words, using the technology is a ‘zero risk’ decision which allows you to scale up, scale down as you need. We don’t need to move our data, it just works, thereby offering immediate ROI, visualised in PowerBI.

AzureCon 4Next up was Bill Staples (t) the CVP for the Azure App Platform, and the focus here was in growing and expanding
businesses using Azure as a base for apps.

Since apps are so personal and based around customers’ experience, they can help accelerate their business transformation and driving rapid results which are customer-centric.

Bill had some pretty interesting case studies and you can find them over on his keynote session, which is over at Channel 9.

Next up, the session I’d looked forward to the most: T.K. Rengarajan (t), CVP Data Platform. Ranga talked about IoT – the Internet of Things with *your* things. As with IoT, there was a focus on Stream Processing and Predictive Analytics. How can we use that data properly? How can we use it for prescriptive analytics i.e. what can I do? What should I do? We should be able to drive intent on it, to derive intelligent action. Here are some use cases:

  • Rockwell use it to manage gas dispensers.
  • Ford are embedding IoT sensors in their cars, going forward.
  • ThyssenKrup – leading elevator manufacturer. Track the health of their elevators’ health, around the globe. Optimise the service experience before it breaks down.
Here is the Thyssen Krup elevator video from Ranga’s talk:

They have the ability to optimise their service experience in predicting failure before the elevator breaks down. Now, that’s predictive analytics in action, using Azure as a base!

AzureCon 2The session then moved to IoT in a box!

Investment principles for IoT
  • IoT Starts with your things
  • Provide connectivity to both existing and new devices
  • Facilitate new insights by garnessing power of untapped data
Azure IoT Suite, Summarised:
  • Preconfigured Solutions
  • Analytics
  • Workflow Automation
  • Device Connectivity
  • Command and control
  • Dashboards
Azure IoT Suite announced a Remote Monitoring Solution, with a Predictive Monitoring Solution onboarding in a few weeks. Now if that wasn’t enough excitement for you, The Azure Data Lake announcement was made and here is the summary:
  • Fully managed system for analytics. Analyse Data of any size, shape and speed
  • Productive day one
  • Build on open standards – YARN
Data Lake – the great tape record in the sky
What type of customers are looking at it, and what do they need?AzureCon 5
  • the ones with unstructured data
  • u-sql
  • u-sql ETL script
  • Unstructured TSV in Data Lake store to structured tables in data lake store
  • including JSON expansion and filtering
  • Data lake can support both structured and unstructured data
  • Its easy to submit a job, and there is even a slider for parallelism! We can slide up to 1000 levels of parallelism. Ranga asked people to submit a name. I like ‘Pixie Dust Slider’ because it’s sprinkling magic on your data, but I don’t think Microsoft marketing would ever go for that!
  • We can see that U-SQL looks very similar to standard SQL
  • We can make references in .NET
  • One of our columns is a JSON object, but with data lake, we can take a function to extract out that column and work with it.
  • The different jobs are broken down.

Finally, we moved on to Jason Zander (t) to talk about cloud infrastructure. More pixie dust to make it happen! Here’s a summary:

  • 24 azure regions, more than Google and AWS combined. Welcome India #Azure data centers!
  • Enough fibre to wrap around the globe, 56 times.
  • 1.4 million miles of fiber in the DCs
  • ExpressRoute – for Azure. Speeds of up to 10 Gigabits per second. 21 ExpressRoute locations worldwide, including London.
Then, it was time for home. It was agreed that the party guests would love to hear more Azure information and they are really keen for another group meeting. I’ll be looking to the community to support our growing group with speakers, so watch this space as we grow more #AzureFamily fans here in the UK.
AzureCon 6 MatthewAnd here is a picture of the youngest Azure fan, who likes it because Halo runs on it…..