What’s wrong with CRISP-DM, and is there an alternative?

Many people, including myself, have discussed CRISP-DM in detail. However, I didn’t feel totally comfortable with it, for a number of reasons which I list below. Now I had raised a problem, I needed to find a solution and that’s where the Microsoft Team Data Science Process comes in. Read on for more detail!

  • What is CRISP-DM?
  • What’s wrong with CRISP-DM?
  • How does technology impinge on CRISP-DM?
  • What comes after CRISP-DM? Enter the Team Data Science Process?
  • What is the Team Data Science Process?

 

What is CRISP-DM?

One common methodology is the CRISP-DM methodology (The Modeling Agency). The Cross Industry Standard Process for Data Mining or (CRISP-DM) model as it is known, is a process framework for designing, creating, building, testing, and deploying machine learning solutions. The process is arranged into six phases. The phases can be seen in the following diagram:

crisp-dm-300x293

The phases are described below

 Phase  Description
Business
Understanding / Data Understanding
The first phase looks at the machine learning
solution from the business standpoint, rather than a technical standpoint.
Once the business concept is defined, the Data Understanding phase focuses on
data familiarity and collation.
Data Preparation In this stage, data will be cleansed and transformed, and it will be
shaped ready for the Modeling phase.
CRISP-DM modeling phase In the modeling phase, various techniques are applied to the data. The
models are further tweaked and refined, and this may involve going back to
the Data Preparation phase in order to correct any unexpected issues.
CRISP-DM evaluation The models need to be tested and verified to ensure that it meets the
business objectives that were defined initially in the business understanding
phase. Otherwise, we may have built a model that does not answer the business
question.
CRISP-DM deployment The models are published so that the customer can make use of them. This
is not the end of the story, however.

Then, the CRISP-DM process restarts. We live in a world of ever-changing data, business requirements, customer needs, and environments, and the process will be repeated.

CRISP-DM is the possibly the most well-known framework for implementing machine learning projects specifically.  It has a good focus on the business understanding piece.

What’s wrong with CRISP-DM?

The model no longer seems to be actively maintained. At the time of writing, the official site, CRISP-DM.org, is no longer being maintained. Further, the framework itself has not been updated on issues on working with new technologies, such as Big Data.

As a project leader, I want to keep up-to-date with the newest frameworks, and the newest technology. It’s true what they say; you won’t get a change until you make a chance.

The methodology itself was conceived in 1996, 21 years ago. I’m not the only one to come out and say so: industry veteran Gregory Piatetsky of KDNuggets had the following to say:

CRISP-DM remains the most popular methodology for analytics, data mining, and data science projects, with 43% share in latest KDnuggets Poll, but a replacement for unmaintained CRISP-DM is long overdue.

Yes, people. Just because something’s popular, it doesn’t mean that it is automatically right. Since the title ‘data scientist’ is the new sexy, lots of inexperienced data scientists are rushing to use this model because it is the obvious one. I don’t think I’d be serving my customers well if I didn’t keep up-to-date, and that’s why I’m moving away from CRISP-DM to the Microsoft Team Data Science Process.

CRISP-DM also neglects aspects of decision making. James Taylor, a veteran of the PASS Business Analytics events, explains this issue in great detail in his blog series over at KDNuggets. If you haven’t read his work, or  I recommend you read his article now and learn from his wisdom.

How does technology impinge on CRISP-DM?

Big Data technologies mean that there can be additional effort spend in the Data Understanding phase, for example, as the business grapples with the additional complexities that are involved in the shape of Big Data sources.

What comes after CRISP-DM? Enter the Team Data Science Process

The next framework, Microsoft’s Team Data Science Process framework, is aimed at including Big Data as a data source. As previously stated, the Data Understanding can be more complex.

Big Data and the Five Vs

There are debates about the number of Vs that apply to Big Data, but let’s go with Ray Wang’s definitions here. Given that our data can be subject to the five Vs as follows:

screen-shot-2012-02-19-at-11-51-19-pm-600x394

This means that our data becomes more confusing for business users to understand and process. This issue can easily distract the business team away from what they are trying to achieve. So, following the Microsoft Team Data Science process can help us to ensure that we have taken our five Vs into account, whilst keep things ticking along for the purpose of the business goal.

As we stated previously, CRISP-DM doesn’t seem to be actively maintained. With Microsoft dollars behind it, the Team Data Science process isn’t going away anytime soon.

What is the Team Data Science Process?

The process is shown in this diagram, courtesy of Microsoft:

tdsp-lifecycle

The Team Data Science Process is loosely divided into five main phases:

  • Business Understanding
  • Data Acquisition and Understanding
  • Modelling
  • Deployment
  • Customer Acceptance
 Phase  Description
Business
Understanding
The Business Understanding process starts with a business idea, which is solved with a machine learning solution. A project plan is generated.
Data Acquisition and Understanding This important phase focuses fact-finding about the data.
Modelling The model is created, built and verified against the original business question. The metrics are evaluated against the key metrics.
Deployment The models are published to production, once they are proven to be a fit solution to the original business question
Customer Acceptance This process is the customer sign-off point. It confirm that the pipeline, the model, and their deployment in a production environment are satisfying customer objectives.

 

The TSDP process itself is not linear; the output of the Data Acquisition and Understanding phase can feed back to the Business Understanding phase, for example. When the essential technical pieces start to appear, such as connecting to data, and the integration of multiple data sources then there may be actions arising from this effort.

The TDSP process is cycle rather than a linear process, and it does not finish, even if the model is deployed. Keep testing and evaluating that model!

TSDP Next Steps

There are a lot of how-to guides and downloads over at the TSDP website, so you should head over and take a look.

The Data Science ‘unicorn’ does not exist. Thanks to Hortonworks for their image below:

unicorn

To mitigate this lack of Data Science unicorn, Team Data Science Summary is a team-oriented solutions which emphasize teamwork and collaboration throughout. It recognizes the importance of working as part of a team to deliver Data Science projects. It also offers useful information on the importance of having standardized source control and backups. It can include open source technology as well as Big Data technologies.

To summarise, the TSDP comprises of a clear structure for you to follow throughout the Data Science process, and facilitates teamwork and collaboration along the way.

Five reasons to be excited about Microsoft Data Insights Summit!

ms-datainsights_linkedin-1200x627-1

I’m delighted to be speaking at Microsoft Data Summit! I’m pumped about my session, which focuses on Power BI for the CEO. I’m also super happy to be attending the Microsoft Data Summit for five top reasons (and others, but five is a nice number!). I’m excited about all of the Excel, Power BI, DAX and Data Science goodies. Here are some sample session titles:

Live Data Streaming in Power BI

Data Science for Analysts

What’s new in Excel

Embed R in Power BI

Spreadsheet Management and Compliance (It is a topic that keeps me up at night!)

Book an in-person appointment with a Microsoft expert with the online Schedule Builder. Bring your hard – or easy – questions! In itself, this is a real chance to speak to Microsoft directly and get expert, indepth  help from the team who make the software that you love.

Steven Levitt of Freakonomics is speaking and I’m delighted to hear him again. I’ve heard him present recently and he was very funny whilst also being insightful. I think you’ll enjoy his session. You’ll know him from Freakonomics.

freakonomics

I’m excited that James Phillips is delivering a keynote! I have had the pleasure of meeting him a few times and I am really excited about where James and the Power BI team have taken Power BI. I’m sure that there will be good things as they steam ahead, so James’ keynote is unmissable!

Alberto Cairo is presenting a keynote! Someone who always makes me sit up a bit straighter when they tweet is Alberto Cairo, and I’m delighted he’s attending. I hope I can get to meet him in person. Whether Alberto is tweeting about data visualisation, design or the world in general, it’s always insightful. I have his latest book and I hope I can ask him to sign it.

003b6f66

Tons of other great speakers! Now someone I haven’t seen for ages – too long in fact – is Rob Collie. Rob is President of PowerPivotPro and you simply have to hear him speak on the topic. He’s direct in explaining how things work, and you will learn from him. I’m glad to see Marco Russo is speaking and I love his sessions. In fact, at TechEd North America, I only got to see one session because I was so busy with presenting, booth duty etc… but I managed to get to see a session and I made sure it was Marco Russo and Alberto Ferrari’s session.  Chris Webb is also presenting and his sessions are always amazing. I have to credit Chris in part for where I am today, because his blog kept me sane and his generosity during sessions meant that I never felt stupid asking him questions. I’m learning too – always.

Ok, that’s five things but there are plenty more. Why not see for yourself?

Join me at the conference, June 12–13, 2017 in Seattle, WA — and be sure to sign up for your 1:1 session with a Microsoft expert.

Upcoming Events

UK Azure Group, SQL Midlands edition

  • Thursday, February 9, 2017
  • 6:30pm 7:30pm
  • Aston Manor Academy

 

To Register: https://www.eventbrite.co.uk/o/sql-midlands-6264475503

Implementing NHS Azure Hybrid Architectures

The NHS is undergoing a time of unprecedented change, as well as increasing financial pressure under a public microscope. In order to meet these challenging requirements, NHS South London and Maudsley is undergoing a Digital Transformation program which is fundamentally ultimately altering delivery of its healthcare services. The Digital Transformation is crucial to the success of the Trust, and it affects everything from the physical layer right up to self-service reporting. It is also an important balancing act between highly sensitive patient privacy in a world that expects data on-demand in mobile and external environments.

In this technical session, join us to learn from the expert team who architected, designed, and delivered the hybrid Azure cloud and SQL Server solution for NHS South London and Maudsley Trust. Learn about the technical constraints and challenges and how we overcame those challenges, particularly through a healthcare lens of highly-sensitive patient privacy issues in a world of data. You will also learn about the technical benefits that were gleaned from this hybrid implementation. In order to bring the achievements to life, you will see real-life insights into healthcare in a Power BI demo, in use by hospital team members.

Using Azure, SQL Server and Power BI means that the NHS is empowered to create enriched opportunities for research to improve patient outcomes, both now and in the future, as well as directly improved patient outcomes now. Join us for this technically-oriented session to see how Azure, SQL Server and Power BI joined forces to fundamentally deliver improved patient healthcare, research and insights in London.

To Register, visit http://www.sqlmidlands.com/events/48-9th-feb-2017-nhs-in-azure-data-factory-custom-activity.html

 

Power BI for the CEO

  • Thu, Apr 6, 20179:00am Sun, Jun 4, 201712:59am
  • The International Centre

Date: Saturday 8th April, time to be determined

Location: The International Centre, St Quentin Gate, Telford, Shropshire, TF3 4JH

Register: http://sqlbits.com/information/registration.aspx

Digital Transformation is much more than just sticking a few Virtual Machines in the cloud; it is real, transformative, long-term change that benefits and impacts the whole organisation.
Digital Transformation is a hot topic with CEOs and the C-level suite, renewing their interest in data and what it can do to empower the organisation.
With the right metrics and data visualisation, Power BI can help to bring clarity and predictability to the CEO to make strategic decisions, understand how their customers behave, and measure what really matters to the organization. This session is aimed at helping you to please your CEO with insightful dashboards in Power BI that are relevant to the CxO in your organisation, or your customers’ organisations.
Using data visualisation principles in Power BI, we will demonstrate how you can help the CEO by giving her the metrics she needs to develop a guiding philosophy based on data-driven leadership. Join this session to get practical advice on how you can help drive your organisation’s short and long term future, using data and Power BI.
As an MBA student and external consultant who delivers solutions worldwide, Jen has experience in advising CEO and C-level executives in terms of strategic and technical direction.
Join this session to learn how to speak their language in order to meet their needs, and impress your CEO with proving it, using Power BI.

Data Visualisation Lies and How to Spot them Techorama

  • Mon, May 22, 20179:00am Wed, May 24, 20175:00pm
  • Kinepolis

Register: Buy Tickets

 

During the acrimonious US election, both sides used a combination of cherry-picked polls and misleading data visualization to paint different pictures with data. In this session, we will use a range of Microsoft Power BI and SSRS technologies in order to examine how people can mislead with data and how to fix it. We will also look at best practices with data visualisation. We will examine the data with Microsoft SSRS and Power BI so that you can see the differences and similarities in these reporting tools when selecting your own Data Visualisation toolkit. Whether you are a Trump supporter, a Clinton supporter or you don’t really care, join this session to spot data lies better in order to make up your own mind.

Taming the Open Source Beast with Azure for Business Intelligence

  • Saturday, June 17, 2017
  • 9:00am 5:00pm
  • Trinity College, College Green, Dublin 2,

Location: Trinity College, College Green, Dublin 2, Dublin, County Dublin, Dublin 2, Ireland

Register: https://www.sqlsaturday.com/620/registernow.aspx

Today, CIOs and other business decision-makers are increasingly recognizing the value of open source software and Azure cloud computing for the enterprise, as a way of driving down costs whilst delivering enterprise capabilities.

For the Business Intelligence professional, how can you introduce Open Source into the Enterprise in a robust way, whilst also creating an architecture that accommodates cloud, on-premise and hybrid architectures?

We will examine strategies for using open source technologies to improve existing common Business Intelligence issues, using Azure as our backdrop. These include:

– incorporating Apache projects, such as Apache Tika, for your BI solution
– using Redis Cache in Azure in as a engine as part of your SSIS toolkit

Join this session to learn more about open source in Azure for Business Intelligence. Open Source does not mean on premise.
Demos will provide practical takeaways in your Business Intelligence Enterprise architecture.

Guess who is appearing in Joseph Sirosh’s PASS Keynote?

This girl! I am super excited and please allow me to have one little SQUUEEEEEEE! before I tell you what’s happening. Now, this is a lifetime achievement for me, and I cannot begin to tell you how absolutely and deeply honoured I am. I am still in shock!

I am working really hard on my demo and….. I am not going to tell you what it is. You’ll have to watch it. Ok, enough about me and all I’ll say is two things: it’s something that’s never been done at PASS Summit before and secondly, watch the keynote because there may be some discussion about….. I can’t tell you what… only that, it’s a must-watch, must-see, must do keynote event.

We are in a new world of Data and Joseph Sirosh and the team are leading the way. Watching the keynote will mean that you get the news as it happens, and it will help you to keep up with the changes. I do have some news about Dr David DeWitt’s Day Two keynote… so keep watching this space. Today I’d like to talk about the Day One keynote with the brilliant Joseph Sirosh, CVP of Microsoft’s Data Group.

Now, if you haven’t seen Joseph Sirosh present before, then you should. I’ve put some of his earlier sessions here and I recommend that you watch them.

Ignite Conference Session

MLDS Atlanta 2016 Keynote

I hear you asking… what am I doing in it? I’m keeping it a surprise! Well, if you read my earlier blog, you’ll know I transitioned from Artificial Intelligence into Business Intelligence and now I do a hybrid of AI and BI. As a Business Intelligence professional, my customers will ask me for advice when they can’t get the data that they want. Over the past few years, the ‘answer’ to their question has gone far, far beyond the usual on-premise SQL Server, Analysis Services, SSRS combo.

We are now in a new world of data. Join in the fun!

Customers sense that there is a new world of data. The ‘answer’ to the question Can you please help me with my data?‘ is complex, varied and it’s very much aimed at cost sensitivities, too. Often, customers struggle with data because they now have a Big Data problem, or a storage problem, or a data visualisation access problem. Azure is very neat because it can cope with all of these issues. Now, my projects are Business Intelligence and Business Analytics projects… but they are also ‘move data to the cloud’ projects in disguise, and that’s in response to the customer need. So if you are Business Intelligence professional, get enthusiastic about the cloud because it really empowers you with a new generation of exciting things you can do to please your users and data consumers.

As a BI or an analytics professional, cloud makes data more interesting and exciting. It means you can have a lot more data, in more shapes and sizes and access it in different ways. It also means that you can focus on what you are good at, and make your data estate even more interesting by augmenting it with cool features in Azure. For example, you could add in more exciting things such as Apache Tika library as a worker role in Azure to crack through PDFs and do interesting things with the data in there. If you bring it into SSIS, then you can tear it up and down again when you don’t need it.

I’d go as far as to say that, if you are in Business Intelligence at the moment, you will need to learn about cloud sooner or later. Eventually, you’re going to run into Big Data issues. Alternatively, your end consumers are going to want their data on a mobile device, and you will want easy solutions to deliver it to them. Customers are interested in analytics and the new world of data and you will need to hop on the Azure bus to be a part of it.

The truth is; Joseph Sirosh’s keynotes always contain amazing demos. (No pressure, Jen, no pressure….. ) Now, it’s important to note that these demos are not ‘smoke and mirrors’….

The future is here, now. You can have this technology too.

It doesn’t take much to get started, and it’s not too far removed from what you have in your organisation. AzureML and Power BI have literally hundreds of examples. I learned AzureML looking at the following book by Wee-Hyong Tok and others, so why not download a free book sample?

https://read.amazon.co.uk/kp/card?asin=B00MBL261W&preview=inline&linkCode=kpe&ref_=cm_sw_r_kb_dp_c54ayb2VHWST4

How do you proceed? Well, why not try a little homespun POC with some of your own data to learn about it, and then show your boss. I don’t know about you but I learn by breaking things, and I break things all the time when I’m  learning. You could download some Power BI workbooks, use the sample data and then try to recreate them, for example. Or, why not look at the community R Gallery and try to play with the scripts. you broke something? no problem! Just download a fresh copy and try again. You’ll get further next time.

I hope to see you at the PASS keynote! To register, click here: http://www.sqlpass.org/summit/2016/Sessions/Keynotes.aspx 

SQL Server 2016 Business Intelligence and Dataviz Masterclass

Join me in Edinburgh on 10th June for a one day Masterclass in SQL Server 2016 Business Intelligence and Data Visualisation!
You’ll get takeaway notes and experience hands-on labs that focus on:

  • Power BI
  • Excel
  • R
  • AzureML
  • SQL Server Analysis Services (SSAS),
  • SQL Server Reporting Services (SSRS) and Datazen

SQLSat Edinburgh agenda

There will be an emphasis on practical applications that mean you can make a difference in your organization, fast. Throughout the day, we will weave best practice data visualization theory so that, regardless of the technology, you will be able to apply the theory to make your data more meaningful and actionable.
You’ll get takeaway notes and experience hands-on labs to maximize your learning.

See you there!

Jen’s Diary: Why are PASS doing Business Analytics at all?

As always, I don’t speak for PASS. This is a braindump from the heart. I realise that we haven’t communicated about BA as much as some members might like. It’s a hard balance – I don’t want to spam people, and I don’t want to get it too light, either. If you want to sign up for PASS BA news, here’s the link. So I have to apologise here, and hold my hands up for that one. I’ll endeavour to ensure we have a better BA communications plan in place, and i’m meeting the team on Friday to discuss how we can make that happen.

In the meantime, I’d like to blog about BA today. How did we get here, and where are we going? Why are PASS interested in Business Analytics at all? To answer this question, let’s look at the history of Business Intelligence, what Business Analytics means, and how PASS can be part of the story. Let’s start with the history lesson. What are the stages of Business Intelligence?

First generation Business Intelligence – this was the world of corporate Business Intelligence. You’ll know this by the phrase ‘the single source of truth’. This was a very technical discipline, focused on the data warehouse. It was dominated by Kimball methodology, or Imon methodology, dependent on the business requirement. However, the business got lost in all this somewhere, and they reverted to the default position of using Excel as a tool to work with Excel exports, and subverting the IT departments by storing data in email. Microsoft did – and still do – cater for the first generation of business intelligence. It has diversified into new cloud products, of course, but SQL Server still rocks. You’ll have seen that Gartner identified SQL Server as the number one RDBMS for 2015. Kudos to the team! For an overview, the Computer Weekly article is interesting.

Second generation Business Intelligence – the industry pivoted to bring the Business back into Business Intelligence. You’ll know this by the phrase ‘self-service business intelligence’. Here, the business user was serviced with clean data sources that they could mash and merge together, and they were empowered to connect to these sources. In the Microsoft sphere, this involved a proliferation of tabular models, PowerPivot as well as continued use of analysis services multidimensional models. As before, Excel remained the default position for working with data. PASS Summit 2015 has a lot of content in both of these areas.

So far, so good. PASS serves a community need by offering high quality, community education on all of these technologies. Sorted, right?

Wrong. The world of data keeps moving. Let’s look at the projected growth of Big Data by Forbes.

Well, the world of business intelligence isn’t over yet; we now have business analytics on the horizon and the world of data is changing fast. We need to keep up! But what do we do with all this data? This is the realm of Business Analytics, and why is it different from BI? The value of business analytics lies in its ability to deliver better outcomes. It’s a different perspective. Note from our first generation and our second generation BI times, technology was at the forefront of the discussion. In business analytics, we talk about organizational change, enabled by technology. In this sphere, we have to quantify and communicate value as the outcome, not the technology as a means to get there. So what comes next?

Third generation of business intelligence – self-service analytics. Data visualisation software has been at the forefront of second generation Business Intelligence, and it has taken a priority. Here, the position is taken that businesses will understand that they need data visualisation technologies as well as analytical tools, to use the data for different purposes.

How is Business Analytics an extension of Business Intelligence? Let’s look at some basic business questions, and see how they fall as BI or BA. Images belong to Gartner so all kudos and copyright to the team over there.

What happened?

If the promise of business intelligence is to be believed, then we have our clean data sources, and we can describe the current state of the business. Gartner call this descriptive analytics, and it answers the question: What happened? This level is our bread-and-butter business intelligence, with an emphasis on the time frame until this current point in time.

Why did it happen?

We can also understand, to a degree, why we are where we are. This is called diagnostic analytics, and it can help pinpoint issues in the organisation. Business Intelligence is a great domain for understanding the organisation until this point in time. However, it’s a rearview impressio of the data. What happens next? Now, we start to get into the remit of Business Analytics:

What will happen?

Businesses want to know what will happen next. Gartner call this predictive analytics, and this perception occurs when we want to try and look for predictive patterns in the data. Once we understand what will happen next, what is the next question?

How can we make this happen?

This is the power of prescriptive analytics; it tells us what we should do, and it is the holy grail of analytics. It uses business intelligence data in order to understand the right path to take, and it builds on the other types of analytics.

Business Intelligence and Business Analytics are a continuum. Analytics is focused more on a forward motion of the data, and a focus on value. People talk about ROI, TCO, making good business decisions based on strong data. First generation and second generation are not going away. A cursory look around a lot of organisations will tell you that. The Third Generation, however, is where organisations start to struggle a bit. PASS can help folks navigate their way towards this new generation of data in the 21st century.

How do we measure value? It is not just about storing the data, protecting it and securing it. These DBA functions are extremely valuable and the business would not function without them – full stop.  So how do we take this data and use it as a way of moving the organisation? We can work with the existing data to improve it; understand and produce the right measures of return, profiling, or other benefits such as team work. Further, analytics is multi-disciplinary. It straddles the organisation, and it has side effects that you can’t see, immediately. This is ‘long term vision’ not ‘operational, reactive, here-and-now’. Analytics can effect change within the organisation, as the process of doing analytics itself means that the organization solves a business problem, which it then seeks to re-apply across different silos within the organization.

SQL Server, on the other hand, is a technology. It is an on-premise relational database technology, which is aimed at a very specific task. This is a different, technologically based perspective. The perspectives in data are changing, as this Gartner illustration taken from here shows:

Why do we need a separate event? We need to meet different people’s attitudes towards data. DBAs have a great attitude; protect, cherish, secure data. BAs also have a great attitude: use, mix, apply learnings from data. You could see BA as a ‘special interest group’ which offers people a different choice. There may not be enough of this material for them at PASS Summit, so they get their own event. If someone wants to go ahead and have a PASS SQLSaturday event which is ‘special interest’ and focuses solely on, say, performance or disaster recovery, for example, then I don’t personally have a problem with that.  I’d let them rock on with it. It might bring in new members, and it offers a more niche offering to people who may or may not attend PASS because they don’t feel that there’s enough specialised, in depth, hard-core down-to-the-metal disaster recovery material in there for them. Business Analytics is the same, by analogy. Hundreds and hundreds of people attended my 3 hour session on R last year; so there is an interest. I see the BA event as a ‘little sister’ to the PASS ‘big brother’ – related, but not quite the same.

Why Analytics in particular? It’s about PASS growth. To grow, it can be painful, and you take a risk. However, I want to be sure that PASS is still growing to meet future needs of the members, as well as attracting new members to the fold However, the feetfall we see at PASS BA, plus our industry-recognised expert speakers, tell us that we are growing in the right direction. Let’s take a look at our keynote speaker, Jer Thorpe, has done work with NASA, the MOMA in New York, he was Data artist in residence at the New York Times and he’s now set up. The Office for Creative Research & adjunct professor at ITP. Last year, we had Mico Yuk, who is author of Dataviz for Dummies, as well as heading up her own consultancy team over at BI Brainz. They are industry experts in their own right, and I’m delighted to add them as part of our growing PASS family who love data.

The PASS BA event also addresses the issue of new and emerging data leaders. How do you help drive your organisation towards becoming a data-oriented organisation? This means that you talk a new language; we talk about new criteria for measuring value, working out return on investment, cross-department communication, and communication of ideas, conclusions to people throughout the organisation, even at the C-level executives. PASS BA is also looking at the career trajectories of these people as well as DBA-oriented folks, and PASS BA is out there putting the ‘Professional’ aspect into the event. We have a separate track, Communicate and Lead, which is all about data leadership and professional development. A whole track – the little sister is smartly bringing the Professional back, folks, and it’s part of our hallmark.

PASS is part of this story of data in the 21st Century. The ‘little sister’ still adds value to the bigger PASS membership, and is an area of growth for the family of PASS.

Any questions, I’m at jen.stirrup@sqlpass.org or please do come to the Board Q&A and ask questions there. If you can’t make it, tweet me at jenstirrup and I’ll see if I can catch them during the Q&A.

My handy IoT Toolkit: What businesses forget about IoT

I recently did a brief blog post for Izenda on IoT and business intelligence, and this part of my IoT series expands on some of the themes there.

The Internet of Things is a new phenomenon; that said, a simple search for ‘Internet of Things IoT’ brings back over 60 million search results in Bing. What is the Internet of Things? The Internet of Things Global Standard gives us the following definition: ‘The Internet of Things (IoT) is defined as Recommendation ITU-T Y.2060 (06/2012) as a global infrastructure for the information society, enabling advanced services by interconnecting (physical and virtual) things based on existing and evolving interoperable information and communication technologies.

Now, this definition is fine but it focuses on the ‘shiny’ aspect of IoT and, most importantly, it does not mention the data aspect of IoT. It emphasises the connectedness of the various gadgets and their interoperability. I prefer Peter Hinssen’s discussion, where he recommends that we talk about the value of the network of things. The connected devices, on their own, will fulfil their purpose. However, if you want real insights from these sources, then you need to splice the data together with other  sources in order to get insights from it.

The thing is, the Internet of Things is really the Internet of You.

We are now heading towards the Zettabyte generation thanks to the Millennial generation. For example, the World Data Bank projects that 50% will have smartphone by 2016, and 80% by 2020. We sent 7.5 trillion SMS messages in 2014. In fact, one app, WhatsApp, sent 7.2 trillion messages. And that’s just data from one app. In 2000, Kodak processed 80 billion photos processed from camera film. In 2014, 800 billion photos from smartphones were shared on social networks. And that’s just the photos that were shared.

We are the Internet of Things.

From the business perspective, how do you make use of that IoT data? The consumerization of IT means that business users are often asked to manage and cleanse data, regardless of its size and nature. Research suggests that data is growing at a rate of 40% of each year into the next decade, driven by increased online activity and usage of smart devices. (ABI, 2013). (The New York Times, 2014 ). The consumerization of data means that business users should be able to access and analyze the data comfortably. When we introduce data that comes under the umbrella o the Internet of Things, business users will need to be able to access IoT data from devices as well as other data sources, to get insights from the data.

How can we harness the IoT phenomenon to understand and serve our customers better?

The addition of data from a variety of sources, including data from devices, means that IoT has a very wide scope and opportunity. IoT can focus on the devices themselves, or the network infrastructure connecting devices, or the analytics derived from the data which comes from the network and the devices. In order to get true insights, the IoT data would be deployed in tandem with other relevant data so that the business user obtains the context. The IoT would also introduce real time data, which would be mixed with historical data.

Customer expectations are rising; and customer-focused businesses will need to put analytics at the heart of their customer services. For example, customers do not distinguish between out of date data, and inaccurate data; for them, they are the same thing. The customer landscape is changing, and it includes the ‘millennials’ who expect technology to offer an unfailing, personal experience whilst being easy to use. This expectation extrapolates to data, and customers expect organizations to have their data correct, timely and personal.

For organizations who put customers front-and-center of their roadmap, management should encourage self-reliance in business users by ensuring that they have the right tools to provide customer-centered service.  Unfortunately, business users can suffer from a split between business intelligence reporting, and the operation systems, as a result of decoupled processes and technology at the point at which they are trying to gain insights. Often, users have to move from one reporting technology to another operational system, and then back again, in order to get the information that they need. This issue can be disruptive in terms of the workflow, and it is an obstacle to insights. In terms of IoT data, business users may have to go and get data from yet another system, and that can be even more confusing.

What does IoT mean for BI? Business Intelligence has matured from the earlier centralized technology emphasis, to a more decentralized business-focused perspective which democratizes the data for the business users. With the advent of IoT technologies, issues on collecting, refining and understanding the data are exacerbated due to the introduction of a variety of structured and unstructured data sources. In the meantime, there is an increased interest in businesses to find insights in raw data. However, this remains a challenge with the introduction of IoT data from devices, creating a mélange of data that can be difficult for business users to assimilate and understand. Companies risk obtaining lower ROI in IoT projects, by focusing only on the data, rather than the insights.

How did the industry get to this point, with disjointed technology and processes, and disconnected users? How can we move forward from here, to including IoT data whilst resolving the issues of previous business intelligence iterations? To understand this unquenchable thirst for data by business users and what it means for the future, let’s start by taking stock of the history of Business Intelligence. What are users’ expectations about data and technology in general? Until recently, these expectations have been largely shaped by the technology. Let’s start with the history lesson. What are the historical stages of Business Intelligence?

The First Generation of Business Intelligence – change in the truth

First generation Business Intelligence is the world of corporate Business Intelligence, embodied by the phrase ‘the single source of truth’. This is a very technical discipline, which focused on the extract-transform-load processing of data into a data warehouse, and focused less on business user intervention. The net result is that the business seemed to be removed from Business Intelligence. In response, the users pushed for decentralization of the data, so that they could drive their own decision making using the data flexibly, and then confirm it independently in the context in which they are operating. In terms of technology, business users reverted to the default position of using Excel as a tool to work with Excel exports, and subverting the IT departments by storing data in email.

The Second Generation of Business Intelligence – change in the users

Second Generation Business Intelligence was the change was wrought by the business users, who demanded clean data sources on a strong technical apparatus that they could mash and merge together, and they were empowered to connect to these sources. In this stage, the industry pivoted to bring the Business back into Business Intelligence, and it is typified by the phrase self-service business intelligence. The intended end result is that the business has structured data sources that the users understand, and the technical teams have a robust technical structure for the architecture in place. As before, Excel remained the default position for working with data, but the business users had more power to mash data together. Self-service business intelligence was not faultless, however. Business users were still dissatisfied with the highly-centralized IT approach as they still relied on other people to give them the data that they need. This issue introduced a delay, which increased the ‘time to answer’ metric whilst simultaneously not recognizing that this feeds into the ‘time to question’ metric. It does not recognize that analytics is a journey, and users expect to ‘surf’ through data in the same way that they ‘surf’ through the Internet.

What problems does IoT introduce for businesses, and how can we resolve them?

Given that there are inefficiencies in the process of business intelligence in organizations at the moment, how is this exacerbated by the introduction of data from devices, otherwise known as the Internet of Things? IoT data introduces new issues for business users for a number of reasons. IoT devices will transmit large amounts of data at a velocity which cannot be simply reconciled with other time-based data. The velocity of the data will add in an additional complexity as business users need to understand ‘what happened when’, and how that marries with existing data sources which may even be legacy in nature. Business users will need that work to be presented to them simply. Further, IoT devices will transmit data in different formats, and this will need to be reconciled so that the actual meaningful data is married to the existing, familiar data. If the business users are moving around disparate technology in order to match the data together, then the disconnected technology presents an obstacle to understanding the data, and thereby obtaining the promised holy grail of insights.

IoT means that we can obtain a granular level of customer data which provides unique insights into customer behavior. However, it does not immediately follow that the data will bring insights on its own; interpretation and analysis will be vital in order to obtain insights. Businesses can interpret IoT as equivalent to data from devices, and it is easy to distracted by shiny gadgetry. The ‘shiny’ approach to IoT can mean that business users are ignored in the journey, thereby shearing their insights from the solution as a whole.

Helping Business Users along the IoT Journey

As internal and external expectations on data increase, the pressure on business users will increase accordingly. Business users will need help to travel along the user journey, to adapt these changes in the data landscape that include IoT data. One solution is to add a new application that will help the business users to work with the IoT data. However, adding a new application will exacerbate the existing issues that business users experience. This might be an easy option for IT, but it will add in a new complexity for the business user. The introduction of IoT data does not necessitate the introduction of new technology to analyze the data.

IoT data resides mainly in the cloud, which means that organization’s infrastructure is changing rapidly. It will need to be reconciled and understood, regardless of where it resides. Organizations can end up with a hybrid architecture of cloud and on premise solutions, and the midst of these complex and fast-moving architectures, business users are easily forgotten. The business users will need to have a seamless environment for resolving cloud and on premise systems to enable them to product the anticipated analysis and results. Business users will find it difficult to navigate the terrain between cloud and on premise data, which will aggravate existing issues in the putting together existing data sources.

Business users have a need for data to carry out a fundamental analytical activity: comparison. How does this data compare to that data? How did last year compare to this year? How did that customer compare with this customer? Answering these simple questions may mean that traditional analytical tools may not be able to cope with the new types of data that are generated by IoT technologies, because the data will be disconnected in terms of technology and process. Excel is excellent for rectangular data sources, but it is not designed for data sources where the data travels at such velocity, and in non-rectangular shapes. So, what’s next?

The Third Generation of Business Intelligence – change in the data

The Third Generation of Business Intelligence is where users work in the era of real change in data, and it is this change is wrought by changes in the data itself. The data has changed fundamentally; it now travels faster, has more shapes, and is bigger in size than ever before. Users are not finding it easy to compare data simply by farming data into Excel; they need to be empowered to tackle seemingly simple business questions, like comparison, in a way that fits with a fluid way of working whilst being empowered with data from the IoT sphere as well as more traditional sources.  By tapping into new volumes and particularly new varieties of data, organizations can ask questions about their customers and their business in a way that they have never been able to do, previously. Further, when we add IoT into the mix, there is a promise of insights from customers and their environments, which can be incredibly valuable to companies. It is not all one way, however: In this era of tech-savvy consumers, customer relationships require planning, nurturing, and constant attention.

There should be an acceptance that business users will want access to IoT sources in the same way as any other source, but these can be exasperating and non-intuitive. Simplicity is vital in the race for the business analyst and user, and their goal is to reduce time and effort in getting the insights that they want, whilst increasing efficiency and value.

So, what gets forgotten in the IoT torrent of attention, and what can we do about it?

Simply put, the business users get lost. They are already getting lost frequently with BI projects, and this will only make matters worse for IoT projects.The ones who mash data together, clean it, make decisions on it, and put it next to other data sources in order to make sense of the data – these are the ones who should be using the data.

Given all of these issues, how do we bring the users back into an IoT architecture? I was faced with this issue recently, when designing an IoT architecture which had a focus on machine learning. IoT work involved a great deal of complexity, which is neatly hidden behind the buzzwords.

The changes in data now mean that there is a clear extension of where the industry has come from, and where it is headed. So what comes next? The third generation of business intelligence: ready to go analytics using data regardless of its shape and size.

Organisations will need to focus on the third generation of Business Intelligence if they are to be successful in facilitating users to have the access to data that they need. Users will want to try and analyse the data themselv es. Fast questions need fast answers, and businesses need to move from their initial business question through to the resulting insight quickly and accurately, in a way that they are comfortable. They also need results at the velocity of the business; answers when they need them. Remembering the users is a deceptively simple requirement that presents a number of challenges.

The dislocation between IT and the business is at its apex when we look at the opposing approaches to data. IT is still seen as a bottleneck rather than an enabler. Business users perceive IT departments as a lag in a process that needs to get from question to insight quickly, and they will look for ‘good enough’ data rather than ‘right data’ in order to get it. The way forward is to make the business users’ activities simpler whilst providing a solution that the IT department are closely involved and find the solution easier to support, so that both parties feel that they own the solution.

The solution should put the focus back on the business users who not on the humans who actually deliver service, create insights, and ultimately add business value. To do this, they need to be able to search for meaning in the data, via aggregation, broadcasting and consuming information in order to add the value that is expected of them.

To summarise, these issues were at the forefront of my mind, when I was architecting an IoT solution recently. In my next post, I will explain my technical choices, based on these considerations. On my survey, it was clear that IoT needs to be taken to a further stage so that it is usable, actionable and sensible; not just data about sensors, but data that is relevant and provides insights.

If you want to talk more about the IoT issues here, or you’re interested in having me come along and speak at your event or workplace, please email me at Jen.stirrup@datarelish.com