Azure Tools and Technologies Cheat Sheet

Don’t you think that the amount of Big Data technologies in Azure is very confusing? I’ve distilled some information from the official Microsoft Azure blog so it’s easier to read. Before we begin, here is a potted Hadoop history:

 

This cheatsheet contains a high level descriptions of the tools, APIs, SDKs, and technologies that you’ll see in Azure. Together, they are used in tandem with big data solutions, and they include proprietary Azure and open source technologies.

I hope that this cheat sheet will help you to more easily identify the tools and technologies you should investigate, depending on the function.

Function Description Tools
Data consumption Extracting and consuming the results from Hadoop-based solutions. Azure Intelligent Systems Service (ISS), Azure SQL Database,LINQ to Hive, Power BI, SQL Server Analysis Services (SSAS),SQL Server Database Engine,SQL Server Reporting Services (SSRS)
Data ingestion Extracting data from data sources and loading it into Hadoop-based solutions Aspera, Avro, AZCopy, Azure Intelligent Systems Service (ISS), Azure Storage Client Libraries, Azure Storage Explorer, Casablanca,Cloudberry Explorer, CloudXplorer, Cross-platform Command Line Interface (X-plat CLI),File Catalyst, Flume, Hadoop Command Line, HDInsight SDK and Microsoft .NET SDK for Hadoop, Kafka, PowerShell,Reactive Extensions (Rx), Signiant,SQL Server Data Quality Services (DQS),SQL Server Integration Services (SSIS),Sqoop,Storm,StreamInsight,Visual Studio Server Explorer
Data processing Processing, querying, and transforming data in Hadoop-based solutions Azure Intelligent Systems Service (ISS),Hcatalog, Hive,LINQ to Hive, Mahout,Map/reduce, Phoenix, Pig, Reactive Extensions (Rx), Samza, Solr,SQL Server Data Quality Services (DQS),Storm,StreamInsight
Data transfer Transfer  data between Hadoop and other data stores such as databases and cloud storage. Falcon,SQL Server Integration Services (SSIS)
Data visualization Visualizing and analyzing the results from Hadoop-based solutions. Azure Intelligent Systems Service (ISS), D3.jx, Microsoft Excel, Power BI, Power Map, Power Query, Power View, PowerPivot
Job submission Processing jobs  in Hadoop-based solutions. HDInsight SDK and Microsoft .NET SDK for Hadoop
Management Manage and monitor  Hadoop-based solutions. Ambari, Azure Storage Client Libraries, Azure Storage Explorer, Cerebrata Azure Management Studio, Chef, Chukwa, CloudXplorer, Ganglia, Hadoop command line,Knox,Azure Management Portal, Azure SDK for Node.js,Puppet, Remote Desktop Connection, REST APIs,System Center management pack for HDInsight,Visual Studio Server Explorer
Workflow Creating workflows and managing multi-step processing in Hadoop-based solutions. Azkaban, Cascading, Hamake, Oozie,SQL Server Integration Services (SSIS)

Any questions, please get in touch at hello@datarelish.com

Want to learn how to light up Big Data Analytics using Apache Spark in Azure?

Businesses struggle with many different aspects of data and technology. It can be difficult to know what technology to choose. Also, it can be hard to know where to turn, when there are so many buzzwords in the mix: analytics, big data and open source. My session at PASS Summit is essentially talking about these things, using Azure and Apache Spark as a backdrop.

Vendors tend to tell their version of events, as you might expect, so it becomes really hard to get advice on how to have a proper blueprint to get you up and running. In this session, I will examine strategies for using open source technologies to improve existing common Business Intelligence issues, using Apache Spark as our backdrop to delivering open source Big Data analytics.

Once we have looked at the strategies, we will look at your choices on how to make the most of the open source technology. For example, how can we make the most of the investment? How can we speed things up? How can we manipulate data?

itoa-illustration-1200x572

These business questions are translated into technical terms. We will explore how we can parallelize your computations across nodes of a Hadoop cluster, once your clusters are set up. We will look combine use of SparkR for data manipulation with ScaleR for model development in Hadoop Spark. At the time of writing, this scenario requires that you maintain separate Spark sessions, only running one session at a time, and exchange data via CSV files. Hopefully, in the near future, we’ll see an R Server release, when SparkR and ScaleR can share a Spark session and so share Spark DataFrames. Hopefully that’s out prior to the session so we can see it, but, nevertheless, we will still look at how ScaleR works with Spark and how we can use Sparkly and SparkR within a ScaleR workflow.

Join my session at PASS Summit 2017 to learn more about open source with Azure for Business Intelligence, with a focus on Azure Spark.

Guess who is appearing in Joseph Sirosh’s PASS Keynote?

This girl! I am super excited and please allow me to have one little SQUUEEEEEEE! before I tell you what’s happening. Now, this is a lifetime achievement for me, and I cannot begin to tell you how absolutely and deeply honoured I am. I am still in shock!

I am working really hard on my demo and….. I am not going to tell you what it is. You’ll have to watch it. Ok, enough about me and all I’ll say is two things: it’s something that’s never been done at PASS Summit before and secondly, watch the keynote because there may be some discussion about….. I can’t tell you what… only that, it’s a must-watch, must-see, must do keynote event.

We are in a new world of Data and Joseph Sirosh and the team are leading the way. Watching the keynote will mean that you get the news as it happens, and it will help you to keep up with the changes. I do have some news about Dr David DeWitt’s Day Two keynote… so keep watching this space. Today I’d like to talk about the Day One keynote with the brilliant Joseph Sirosh, CVP of Microsoft’s Data Group.

Now, if you haven’t seen Joseph Sirosh present before, then you should. I’ve put some of his earlier sessions here and I recommend that you watch them.

Ignite Conference Session

MLDS Atlanta 2016 Keynote

I hear you asking… what am I doing in it? I’m keeping it a surprise! Well, if you read my earlier blog, you’ll know I transitioned from Artificial Intelligence into Business Intelligence and now I do a hybrid of AI and BI. As a Business Intelligence professional, my customers will ask me for advice when they can’t get the data that they want. Over the past few years, the ‘answer’ to their question has gone far, far beyond the usual on-premise SQL Server, Analysis Services, SSRS combo.

We are now in a new world of data. Join in the fun!

Customers sense that there is a new world of data. The ‘answer’ to the question Can you please help me with my data?‘ is complex, varied and it’s very much aimed at cost sensitivities, too. Often, customers struggle with data because they now have a Big Data problem, or a storage problem, or a data visualisation access problem. Azure is very neat because it can cope with all of these issues. Now, my projects are Business Intelligence and Business Analytics projects… but they are also ‘move data to the cloud’ projects in disguise, and that’s in response to the customer need. So if you are Business Intelligence professional, get enthusiastic about the cloud because it really empowers you with a new generation of exciting things you can do to please your users and data consumers.

As a BI or an analytics professional, cloud makes data more interesting and exciting. It means you can have a lot more data, in more shapes and sizes and access it in different ways. It also means that you can focus on what you are good at, and make your data estate even more interesting by augmenting it with cool features in Azure. For example, you could add in more exciting things such as Apache Tika library as a worker role in Azure to crack through PDFs and do interesting things with the data in there. If you bring it into SSIS, then you can tear it up and down again when you don’t need it.

I’d go as far as to say that, if you are in Business Intelligence at the moment, you will need to learn about cloud sooner or later. Eventually, you’re going to run into Big Data issues. Alternatively, your end consumers are going to want their data on a mobile device, and you will want easy solutions to deliver it to them. Customers are interested in analytics and the new world of data and you will need to hop on the Azure bus to be a part of it.

The truth is; Joseph Sirosh’s keynotes always contain amazing demos. (No pressure, Jen, no pressure….. ) Now, it’s important to note that these demos are not ‘smoke and mirrors’….

The future is here, now. You can have this technology too.

It doesn’t take much to get started, and it’s not too far removed from what you have in your organisation. AzureML and Power BI have literally hundreds of examples. I learned AzureML looking at the following book by Wee-Hyong Tok and others, so why not download a free book sample?

https://read.amazon.co.uk/kp/card?asin=B00MBL261W&preview=inline&linkCode=kpe&ref_=cm_sw_r_kb_dp_c54ayb2VHWST4

How do you proceed? Well, why not try a little homespun POC with some of your own data to learn about it, and then show your boss. I don’t know about you but I learn by breaking things, and I break things all the time when I’m  learning. You could download some Power BI workbooks, use the sample data and then try to recreate them, for example. Or, why not look at the community R Gallery and try to play with the scripts. you broke something? no problem! Just download a fresh copy and try again. You’ll get further next time.

I hope to see you at the PASS keynote! To register, click here: http://www.sqlpass.org/summit/2016/Sessions/Keynotes.aspx 

5 Factors to Consider before Pouring Data in your Data Lake

Great_Wave_off_Kanagawa2

As organizations move into a Big Data world, many projects will include a Data Lake component. What is a data lake, and how do we get our data into the Data Lake?

There are many different interpretations of a Data Lake, but the interpretation given by Attunity and Hortonworks’ jointly produced Data Lake Adoption and Maturity Survey Findings Report –  is a good explanation since it refers to the Data Lake as a strategy as well as an architecture.

A Data Lake is defined the data lake as an architectural strategy and an architectural destination, thus addressing both the end state architecture and establishing an adoption and transformation strategy for data architecture related decisions on the journey to the data lake 

In order to make the investment of the Data Lake, the data must get into the Data Lake somehow. If you believe the hype, organizations should simply be able to pour the data into the Lake without being concerned with joining it together. Things are not that simple, however.  One component that’s difficult for analysts to grasp is that the data storage is different, and this will impact them at the point of query. And this is without the complexities of combining Data Lake data with data from in-memory databases, for example. Data is stored in raw flat files in Hadoop’s Distributed File System (HDFS).  Moving from rectangular to non-rectangular data held in Hadoop – this is a real mind-shift for business users. People are going from a relational world to a batch processing world, and having to mix both; this is not going to be straightforward for many organizations, who already struggle with the rectangular data.

The complexity lies in getting the data into the Data Lake in the first place. A data lake is primarily a collection of data services. Raw data takes up more space, however, and it is much more difficult for analysts to navigate. In turn, this means that it is more difficult for analysts to query, and the querying will not happen at the speed of the business.  In turn, this will make it difficult to take machine scale data and make it human scale, so that it can be summarized, sliced and diced and compressed for business decision making. In order to approach these issues, the cloud offers experimentation and exploration, and is well suited to Data Lake implementations. The inclusion of cloud as the location for a Data Lake can add an additional point of complexity when getting the data into the Data Lake.

Who actually owns the data going into the Data Lake? IT retains overall responsibility for the guardianship of the data in the Data Lake, plus cataloguing the data for retrieval. The business analysts and data scientists are responsible for the usage of the Data Lake, surfing this unstructured data repository in order to answer business questions by progressively adding structure to the data. The data lifecycle of the data lake alters how the data pipeline works to get data into the Data Lake as a first step.

One key opportunity offered by Data Lakes is the ability to break down silos in the organization. The information can be amassed from various sources from different departments and from external sources, and it can be co-located in order to create a wider organizational Big Data program, with a focus on Analytics.  Many organizations have a series of dirty data puddles rather than a data lake, and this isn’t conducive to business insights driven by data.

Any strategy will need to reflect that the landscape of data is fluid and changing. These data puddleswill need to be incorporated into the Data Lake as part of the data lake creation process, andorganizations will need to review their in-house legacy systems. For larger organizations, the in-house data warehouse is a key concern, which has a more important short-term relevance. And this happens before the business have the opportunity to create queries on their data lake. This will help to move the organization forward by making it easier to ask horizontal questions of the business, rather than just vertical questions which focus on the siloed departments within the business. The Data Lake is an attractive strategy for circumventing these disconnections within the organization, so it is not just a technical concept but a mover for deeper organizational change.

Anyone who has worked with data will know the pain of meshing together in spreadsheets and, in order to move at the speed of the business, Business Analysts spend a lot of time restructuring, reformatting, blending, meshing and consolidating data. There is real pressure to find answers and insights in an increasing number of data sources which themselves increase in size, whilst doing so in decreasing space of time. However, in order to get that far, the data has to get into the Data Lake in the first place. This involves a lot of moving parts, but it’s clear from the study that many organizations, whilst remaining skeptical of the hype, are continuing to move forward. How is it possible to move forward? What needs to be considered before putting data into the Data Lake?

1. Organizational maturity in adopting the Data Lake

Organizations need to understand better where they are in terms of their maturity and commitment to Data Lake. In their report, Data Lake Adoption and Maturity Survey Findings Report, Radiant Advisors lay out a number of key stages in determining the organizational maturity in their Data Lake approach. These stages are listed here, but the reader is referred to the Data Lake Adoption and Maturity Survey Findings Report for a full description.

  • Evaluate
  • Reactionary
  • Proactive
  • Core Competency

The organization will need to take a step back to understand better their existing status. Are they just starting out? Are other departments which are doing the same thing, perhaps in the local organization or somewhere else in the world? Once the organization understands their state better, they can start to broadly work out the strategy that the Data Lake is intended to provide.

As part of this understanding, the objective of the Data Lake will need to be identified. Is it for data science? Or, for example, is the Data Lake simply to store data in a holding pattern for data discovery? Identifying the objective will help align the vision and the goals, and set the scene for communication to move forward.

2. Executive Sponsorship

If an organization has only started out on the journey towards a Data Lake, then they will need to involve an executive sponsor in order to help provide the right vision and strategy for the Data Lake. This means that there will be executive sponsorship and support to lead the organization in the execution of the data-driven strategy. Understanding the ROI and business goals will help to keep the project on track, and the goals in focus.

3. Governance

Data Governance is critical for enterprise data, and particularly for data which concerns people.  Governance will help the organization to find a common framework to consider important points when putting data in the Lake, such as discovering what data should go in the Data Lake, how it gets there, and how it should be protected.

4. Engaging the Users

As a consequence of obtaining executive sponsorship, a key item is ensure that the organization has obtained the right skills in-house, both technical and professional, to make execution easier. Eventual adoption of the Data Lake will be easier if the users are involved as the Data Lake develops and progresses. This will also involve learning new skill sets, and users will feel that this is a valuable exercise. It is also a way of getting over the Shiny New Objects syndrome, whereby people get dazzled by bright new things and this can mean that the technology is emphasized at the expense of other relevant factors, such as the business processes. User engagement facilitates long-term adoption and usage of many technologies, and this is also applicable to the Data Lake.

5. Prepare – Call to Action

As a next step, it’s recommended to read the Data Lake Adoption and Maturity Survey Findings Report from the Attunity website for more data and advice in order to devise a strategy to facilitate the Data Lake adoption process.

The PASS Business Analytics Conference is having a number of Big Data experts, with a focus on Data Lake. The emphasis is on practical information, as well as a ‘Communicate and Lead’ track to help developing leaders to take their organization towards a data-driven enterprise strategy.

Conclusion

To summarise, there are a number of strands which need to be considered when adopting a Data lake, and considering the factors involved in populating the Data Lake with data. It’s clear that there are many issues involved and it’s not as simple as pouring the data in. These issues are wide-ranging and touch many parts of the organization, and enterprises will need to take this issues into account when approaching the Data Lake opportunity.

References

Best Practices for Data Lake – Who’s Using It and How Can You Get the Most Value From It? By CaroleGunst. See more at: http://attunity.com/blog/best-practices-for-data-lake-whos-using-it-and-how-can-you-get-the-most-value-from-it#sthash.DUefFh7b.dpuf

Data Lake Adoption and Maturity Survey Findings Report – http://learn.attunity.com/data-lake-adoption-and-maturity-survey-findings-register-0

Jen’s Diary: Why are PASS doing Business Analytics at all?

As always, I don’t speak for PASS. This is a braindump from the heart. I realise that we haven’t communicated about BA as much as some members might like. It’s a hard balance – I don’t want to spam people, and I don’t want to get it too light, either. If you want to sign up for PASS BA news, here’s the link. So I have to apologise here, and hold my hands up for that one. I’ll endeavour to ensure we have a better BA communications plan in place, and i’m meeting the team on Friday to discuss how we can make that happen.

In the meantime, I’d like to blog about BA today. How did we get here, and where are we going? Why are PASS interested in Business Analytics at all? To answer this question, let’s look at the history of Business Intelligence, what Business Analytics means, and how PASS can be part of the story. Let’s start with the history lesson. What are the stages of Business Intelligence?

First generation Business Intelligence – this was the world of corporate Business Intelligence. You’ll know this by the phrase ‘the single source of truth’. This was a very technical discipline, focused on the data warehouse. It was dominated by Kimball methodology, or Imon methodology, dependent on the business requirement. However, the business got lost in all this somewhere, and they reverted to the default position of using Excel as a tool to work with Excel exports, and subverting the IT departments by storing data in email. Microsoft did – and still do – cater for the first generation of business intelligence. It has diversified into new cloud products, of course, but SQL Server still rocks. You’ll have seen that Gartner identified SQL Server as the number one RDBMS for 2015. Kudos to the team! For an overview, the Computer Weekly article is interesting.

Second generation Business Intelligence – the industry pivoted to bring the Business back into Business Intelligence. You’ll know this by the phrase ‘self-service business intelligence’. Here, the business user was serviced with clean data sources that they could mash and merge together, and they were empowered to connect to these sources. In the Microsoft sphere, this involved a proliferation of tabular models, PowerPivot as well as continued use of analysis services multidimensional models. As before, Excel remained the default position for working with data. PASS Summit 2015 has a lot of content in both of these areas.

So far, so good. PASS serves a community need by offering high quality, community education on all of these technologies. Sorted, right?

Wrong. The world of data keeps moving. Let’s look at the projected growth of Big Data by Forbes.

Well, the world of business intelligence isn’t over yet; we now have business analytics on the horizon and the world of data is changing fast. We need to keep up! But what do we do with all this data? This is the realm of Business Analytics, and why is it different from BI? The value of business analytics lies in its ability to deliver better outcomes. It’s a different perspective. Note from our first generation and our second generation BI times, technology was at the forefront of the discussion. In business analytics, we talk about organizational change, enabled by technology. In this sphere, we have to quantify and communicate value as the outcome, not the technology as a means to get there. So what comes next?

Third generation of business intelligence – self-service analytics. Data visualisation software has been at the forefront of second generation Business Intelligence, and it has taken a priority. Here, the position is taken that businesses will understand that they need data visualisation technologies as well as analytical tools, to use the data for different purposes.

How is Business Analytics an extension of Business Intelligence? Let’s look at some basic business questions, and see how they fall as BI or BA. Images belong to Gartner so all kudos and copyright to the team over there.

What happened?

If the promise of business intelligence is to be believed, then we have our clean data sources, and we can describe the current state of the business. Gartner call this descriptive analytics, and it answers the question: What happened? This level is our bread-and-butter business intelligence, with an emphasis on the time frame until this current point in time.

Why did it happen?

We can also understand, to a degree, why we are where we are. This is called diagnostic analytics, and it can help pinpoint issues in the organisation. Business Intelligence is a great domain for understanding the organisation until this point in time. However, it’s a rearview impressio of the data. What happens next? Now, we start to get into the remit of Business Analytics:

What will happen?

Businesses want to know what will happen next. Gartner call this predictive analytics, and this perception occurs when we want to try and look for predictive patterns in the data. Once we understand what will happen next, what is the next question?

How can we make this happen?

This is the power of prescriptive analytics; it tells us what we should do, and it is the holy grail of analytics. It uses business intelligence data in order to understand the right path to take, and it builds on the other types of analytics.

Business Intelligence and Business Analytics are a continuum. Analytics is focused more on a forward motion of the data, and a focus on value. People talk about ROI, TCO, making good business decisions based on strong data. First generation and second generation are not going away. A cursory look around a lot of organisations will tell you that. The Third Generation, however, is where organisations start to struggle a bit. PASS can help folks navigate their way towards this new generation of data in the 21st century.

How do we measure value? It is not just about storing the data, protecting it and securing it. These DBA functions are extremely valuable and the business would not function without them – full stop.  So how do we take this data and use it as a way of moving the organisation? We can work with the existing data to improve it; understand and produce the right measures of return, profiling, or other benefits such as team work. Further, analytics is multi-disciplinary. It straddles the organisation, and it has side effects that you can’t see, immediately. This is ‘long term vision’ not ‘operational, reactive, here-and-now’. Analytics can effect change within the organisation, as the process of doing analytics itself means that the organization solves a business problem, which it then seeks to re-apply across different silos within the organization.

SQL Server, on the other hand, is a technology. It is an on-premise relational database technology, which is aimed at a very specific task. This is a different, technologically based perspective. The perspectives in data are changing, as this Gartner illustration taken from here shows:

Why do we need a separate event? We need to meet different people’s attitudes towards data. DBAs have a great attitude; protect, cherish, secure data. BAs also have a great attitude: use, mix, apply learnings from data. You could see BA as a ‘special interest group’ which offers people a different choice. There may not be enough of this material for them at PASS Summit, so they get their own event. If someone wants to go ahead and have a PASS SQLSaturday event which is ‘special interest’ and focuses solely on, say, performance or disaster recovery, for example, then I don’t personally have a problem with that.  I’d let them rock on with it. It might bring in new members, and it offers a more niche offering to people who may or may not attend PASS because they don’t feel that there’s enough specialised, in depth, hard-core down-to-the-metal disaster recovery material in there for them. Business Analytics is the same, by analogy. Hundreds and hundreds of people attended my 3 hour session on R last year; so there is an interest. I see the BA event as a ‘little sister’ to the PASS ‘big brother’ – related, but not quite the same.

Why Analytics in particular? It’s about PASS growth. To grow, it can be painful, and you take a risk. However, I want to be sure that PASS is still growing to meet future needs of the members, as well as attracting new members to the fold However, the feetfall we see at PASS BA, plus our industry-recognised expert speakers, tell us that we are growing in the right direction. Let’s take a look at our keynote speaker, Jer Thorpe, has done work with NASA, the MOMA in New York, he was Data artist in residence at the New York Times and he’s now set up. The Office for Creative Research & adjunct professor at ITP. Last year, we had Mico Yuk, who is author of Dataviz for Dummies, as well as heading up her own consultancy team over at BI Brainz. They are industry experts in their own right, and I’m delighted to add them as part of our growing PASS family who love data.

The PASS BA event also addresses the issue of new and emerging data leaders. How do you help drive your organisation towards becoming a data-oriented organisation? This means that you talk a new language; we talk about new criteria for measuring value, working out return on investment, cross-department communication, and communication of ideas, conclusions to people throughout the organisation, even at the C-level executives. PASS BA is also looking at the career trajectories of these people as well as DBA-oriented folks, and PASS BA is out there putting the ‘Professional’ aspect into the event. We have a separate track, Communicate and Lead, which is all about data leadership and professional development. A whole track – the little sister is smartly bringing the Professional back, folks, and it’s part of our hallmark.

PASS is part of this story of data in the 21st Century. The ‘little sister’ still adds value to the bigger PASS membership, and is an area of growth for the family of PASS.

Any questions, I’m at jen.stirrup@sqlpass.org or please do come to the Board Q&A and ask questions there. If you can’t make it, tweet me at jenstirrup and I’ll see if I can catch them during the Q&A.

AzureCon round up: Intelligent Cloud, Applications, Data, Infrastructure, Business Agility and Cloud Ability

IAzureCon 1 organised an AzureCon viewing party tonight in Hertfordshire, with great team support from Team Awesome over at Cloudamour.

We watched a total of four keynotes, running back to back for almost four hours. The keynotes were all awesome and I’ve blogged some learnings here.AzureCon 3

First up, was Scott Guthrie (t), igniting the keynotes and kicking off the event with the journey to the intelligent cloud. I missed some of this piece because I was welcoming guests as they arrived, making introductions and so on. If you want to see the video, you can catch Scott Guthrie here on Channel 9. The thrust of Scott’s session was about cloud energising business and technical leaders worldwide turn the digital disruption into their advantage. Scott led customers who used cloud to enable their business to break new ground, and share their best practices in using some of the latest Microsoft innovations in enabling their journey to the cloud.

My personal favourite part of this piece was seeing the inspirational Lara Rubbelke (t) up on stage. Lara is inspirational and she’s generous with her time, supporting SQLFamily members. Lara explained the SQL Data Warehouse very clearly in terms of its simplicity to set up, and it’s relevance to the business. I liked her piece because she talked tech and business equally and that’s hard. It’s something I find that I have to do in my role every day; basically, wearing different hats, and it’s not easy to accomplish. Lara achieves this with ease and I recommend that you watch her segment, which is about 32 minutes into the video She also makes you think about how this could be relevant in your environment and that is an important takeaway.

In Lara’s words, using the technology is a ‘zero risk’ decision which allows you to scale up, scale down as you need. We don’t need to move our data, it just works, thereby offering immediate ROI, visualised in PowerBI.

AzureCon 4Next up was Bill Staples (t) the CVP for the Azure App Platform, and the focus here was in growing and expanding
businesses using Azure as a base for apps.

Since apps are so personal and based around customers’ experience, they can help accelerate their business transformation and driving rapid results which are customer-centric.

Bill had some pretty interesting case studies and you can find them over on his keynote session, which is over at Channel 9.

Next up, the session I’d looked forward to the most: T.K. Rengarajan (t), CVP Data Platform. Ranga talked about IoT – the Internet of Things with *your* things. As with IoT, there was a focus on Stream Processing and Predictive Analytics. How can we use that data properly? How can we use it for prescriptive analytics i.e. what can I do? What should I do? We should be able to drive intent on it, to derive intelligent action. Here are some use cases:

  • Rockwell use it to manage gas dispensers.
  • Ford are embedding IoT sensors in their cars, going forward.
  • ThyssenKrup – leading elevator manufacturer. Track the health of their elevators’ health, around the globe. Optimise the service experience before it breaks down.
Here is the Thyssen Krup elevator video from Ranga’s talk:

They have the ability to optimise their service experience in predicting failure before the elevator breaks down. Now, that’s predictive analytics in action, using Azure as a base!

AzureCon 2The session then moved to IoT in a box!

Investment principles for IoT
  • IoT Starts with your things
  • Provide connectivity to both existing and new devices
  • Facilitate new insights by garnessing power of untapped data
Azure IoT Suite, Summarised:
  • Preconfigured Solutions
  • Analytics
  • Workflow Automation
  • Device Connectivity
  • Command and control
  • Dashboards
Azure IoT Suite announced a Remote Monitoring Solution, with a Predictive Monitoring Solution onboarding in a few weeks. Now if that wasn’t enough excitement for you, The Azure Data Lake announcement was made and here is the summary:
  • Fully managed system for analytics. Analyse Data of any size, shape and speed
  • Productive day one
  • Build on open standards – YARN
Data Lake – the great tape record in the sky
What type of customers are looking at it, and what do they need?AzureCon 5
  • the ones with unstructured data
  • u-sql
  • u-sql ETL script
  • Unstructured TSV in Data Lake store to structured tables in data lake store
  • including JSON expansion and filtering
  • Data lake can support both structured and unstructured data
  • Its easy to submit a job, and there is even a slider for parallelism! We can slide up to 1000 levels of parallelism. Ranga asked people to submit a name. I like ‘Pixie Dust Slider’ because it’s sprinkling magic on your data, but I don’t think Microsoft marketing would ever go for that!
  • We can see that U-SQL looks very similar to standard SQL
  • We can make references in .NET
  • One of our columns is a JSON object, but with data lake, we can take a function to extract out that column and work with it.
  • The different jobs are broken down.

Finally, we moved on to Jason Zander (t) to talk about cloud infrastructure. More pixie dust to make it happen! Here’s a summary:

  • 24 azure regions, more than Google and AWS combined. Welcome India #Azure data centers!
  • Enough fibre to wrap around the globe, 56 times.
  • 1.4 million miles of fiber in the DCs
  • ExpressRoute – for Azure. Speeds of up to 10 Gigabits per second. 21 ExpressRoute locations worldwide, including London.
Then, it was time for home. It was agreed that the party guests would love to hear more Azure information and they are really keen for another group meeting. I’ll be looking to the community to support our growing group with speakers, so watch this space as we grow more #AzureFamily fans here in the UK.
AzureCon 6 MatthewAnd here is a picture of the youngest Azure fan, who likes it because Halo runs on it…..

Note to Self: A roundup of the latest Azure blog posts and whitepapers on polybase, network security, cloud services, Hadoop and Virtual Machines

Here is a roundup of Azure blogs and whitepapers which I will be reading this month.

This is the latest as at June 2014, and there is a focus on cloud security in the latest whitepapers, which you can find below..

·         PolyBase in APS – Yet another SQL over Hadoop solution?
·         Desktop virtualization deployment overview
·         Microsoft updates its Hadoop cloud solution
·         LG CNS build a B2B virtual computer service in the cloud
·         Deploying desktop virtualization
·         Microsoft updates its Hadoop cloud solution
·         Accessing desktop virtualization
·         The visualization that changed the world of data
·         Access and Information Protection: Setting up the environment
·         Access and Information Protection: Making resources available to users
·         Access and Information Protection: Simple registration for BYOD devices
·         Success with Hybrid Cloud webinar series
·         Power BI May round-up
·         Access and Information Protection: Syncing and protecting corporate information

Here are the latest whitepapers, which focus on security:

 
Windows Azure Security: Technical Insights. Update to the Security Overview whitepaper which provides a detailed description of security features and controls.
  • Security Best Practices for Windows Azure Solutions. Updated guidance on designing and developing secure solutions.
  • Windows Azure Network Security. Recommendations for securing network communications for applications deployed in Windows Azure.
  • Microsoft Antimalware for Azure Cloud Services and Virtual Machines This paper details how to use Microsoft Antimalware to help identify and remove viruses, spyware, and other malicious software in Azure Cloud Services and Virtual Machines.