Learning pathway for SQL Server 2016 and R Part 2: Divided by a Common Language


Britain has “really everything in common with America nowadays, except, of course, language.” Said Oscar Wilde, in the Centerville Ghost (1887) whilst George Bernard Shaw is quoted as saying that the “The United States and Great Britain are two countries separated by a common language.”

There are similarities and differences between SQL and R, which might be confusing. However, I think it can be illuminating to understand these similarities and differences since it tells you something about each language. I got this idea from one of the attendees at PASS Summit 2015 and my kudos and thanks go to her. I’m sorry I didn’t get  her name, but if you see this you will know who you are, so please feel free to leave a comment so that I can give you a proper shout out.

If you are looking for an intro to R from the Excel perspective, see this brilliant blog here. Here’s a list onto get us started. If you can think of any more, please give me a shout and I will update it. It’s just an overview and it’s to help the novice get started on a path of self-guided research into both of these fascinating topics.

R SQL / BI background
A Factor has special properties; it can represent a categorical variable, which are used in linear regression, ANOVA etc. It can also be used for grouping. A Dimension is a way of describing categorical variables. We see this in the Microsoft Business Intelligence stack.
in R, dim means that we can give a chunk of data dimensions, or, in other words, give it a size. You could use dim to turn a list into a matrix, for example Following Kimball methodology, we tend to prefix tables as dim if they are dimension tables. Here, we mean ‘dimensions’ in the Kimball sense, where a ‘dimension’ is a way of describing data. If you take a report title, such as Sales by geography, then ‘geography’ would be your dimension.
R memory management can be confusing. Read Matthew Keller’s excellent post here. If you use R to look at large data sets, you’ll need to know
– how much memory an object is taking;
– 32-bit R vs 64-bit R;
– packages designed to store objects on disk, not RAM;
– gc() for memory garbage collection
– reduce memory fragmentation.
SQL Server 2016 CTP3 brings native In-database support for the open source R language. You can call both R, RevoScaleR functions and scripts directly from within a SQL query. This circumvents the R memory issue because SQL Server benefits the user, by introducing multi-threaded and multi-core in-DB computations
Data frame is a way of storing data in tables. It is a tightly coupled collections of variables arranged in rows and columns. It is a fundamental data structure in R. In SQL SSRS, we would call this a data set. In T-SQL, it’s just a table. The data is formatted into rows and columns, with mixed data types.
All columns in a matrix must have the same mode(numeric, character, and so on) and the same length. A matrix in SSRS is a way of displaying, grouping and summarizing data. It acts like a pivot table in Excel.
 <tablename>$<columnname> is one way you can call a table with specific reference to a column name.  <tablename>.<columname> is how we do it in SQL, or you could just call the column name on its own.
To print something, type in the variable name at the command prompt. Note, you can only print items one at a time, so use cat to combine multiple items to print out. Alternatively, use the print function. One magic feature of R is that it knows magically how to format any R value for printing e.g.


PRINT returns a user-defined message to the client. See the BOL entry here. https://msdn.microsoft.com/en-us/library/ms176047.aspx

CONCAT returns a string that is the result of concatenating two or more string values. https://msdn.microsoft.com/en-GB/library/hh231515.aspx

Variables allow you to store data temporarily during the execution of code. If you define it at the command prompt, the variable is contained in your workspace. It is held in memory, but it can be saved to disk. In R, variables are dynamically typed so you can chop and change the type as you see fit. Variables are declared in the body of a batch or procedure with the DECLARE statement and are assigned values by using either a SET or SELECT statement. Variables are not dynamically typed, unlike R. For in-depth look at variables, see Itzik Ben-Gan’s article here.
ls allows you to list the variables and functions in your workspace. you can use ls.str to list out some additional information about each variable. SQL Server has tables, not arrays. It works differently, and you can find a great explanation over at Erland Sommarskog’s blog. For SQL Server 2016 specific information, please visit the Microsoft site.
A Vector is a key data structure in R, which has tons of flexibility and extras. Vectors can’t have a mix of data types, and they are created using the c(…) operator. If it is a vector of vectors, R makes them into a single vector. Batch-mode execution is sometimes known as vector-based or vectorized execution. It is a query processing method in which queries process multiple rows together. A popular item in SQL Server 2016 is Columnstore Indexes, which uses batch-mode execution. To dig into more detail, I’d recommend Niko Neugebauer’s excellent blog series here, or the Microsoft summary.

There will be plenty of other examples, but I hope that helps for now.

Learning pathway for SQL Server 2016 and R Part 1: Installation and configuration

Jargogled is an archaic word for getting confused or mixed up. I’m aware that there are lots of SQL Server folks out there, who are desperate to learn R, but might be jaRgogled by R. Now that R is in SQL Server, it seems like the perfect opportunity to start a new blog series to help people to splice the two great technologies together. So here you are!

First up, what do you need to know about SQL Server installation with R? The installation sequence is well documented here. However, if you want to make sure that the R piece is installed, then you will need to make sure that you do one thing: tick the Advanced Analytics Extension box.

SQL Server 2016 R Feature Selection

You need to select ‘Advanced Analytics Extensions’, which you will find under ‘Instance Features’. Once you’ve done that, you are good to proceed with the rest of your installation.

Once SQL Server is installed, let’s get some data into a SQL Server database. Firstly, you’ll need to create a test database, if you don’t have one already. You can find some information on database creation in SQL Server 2016 over at this Microsoft blog. You can import some data very quickly and there are different ways of importing data. If you need more information on this, please read this Microsoft blog.

If you fancy taking some sample data, try out the UCI Machine Learning data repository. You can download some data from there, following the instructions on that site, and then pop it into SQL Server.

If you have Office x64 installed on your machine, you might run into an issue:

Microsoft.ACE.OLEDB.15.0′ provider is not registered on the local machine

I ran into this issue when I tried to import some data into SQL Server using the quick and dirty ‘import data’ menu item in SSMS. After some fishing around, I got rid of it by doing the following:

There are other ways of importing data, of course, but I wanted to play with R and SQL Server, and not spend a whole chunk of time importing data.

In our next tutorial, we will look at some of the vocabulary for R and SQL Server which can look confusing for people from both disciplines. Once you learn the terminology, then you’ll see that you already know a lot of the concepts in R from your SQL Server and Business Intelligence expertise. That expertise will help you to springboard to R expertise, which is great for your career.

PASS Summit Notes for my AzureML, R and Power BI Presentation

I’m going to have fun with my AzureML session today at PASS Summit! More will follow on this post later; I am racing off to the keynote so I don’t have long :)

I heard some folks weren’t sure whether to attend my session or Chris Webb’s session. I’m honestly flattered but I’m not in the same league as Chris! I’ve posted my notes here so that folks can go off and attend Chris’ session, if they are stuck between the two.

Here is the order of things:

  • Slide Deck
  • How do you choose a machine learning algorithm?
  • How do you carry out an AzureML project?
  • AzureML Experiment 
  • R Code

So, the slide deck is here:

  • AzureML Experiment 

You can see this experiment in the AzureML Gallery. You may have to sign up for a Windows Live account to get a free AzureML studio account, and I recommend that you do.

  • How do you choose a machine learning algorithm?

Kudos to Microsoft – this is their cheatsheet and I recommend that you look at the original page.

Here is some more information on the topic from Microsoft, and I recommend that you follow it.

How do you carry out an AzureML project?

Try the CRISP-DM Framework for a start

See the Modelling Agency for the original source. https://the-modeling-agency.com/crisp-dm.pdf

CRISP-DM Process Diagram.png
CRISP-DM Process Diagram” by Kenneth JensenOwn work. Licensed under CC BY-SA 3.0 via Commons.

R Code

Here’s a sample R code. I know it is simple, and there are better ways of doing this. However, remember that this is for instructional purposes in front of +/- 500 people so I want to be sure everyone has a grounding before we talk more complicated things.

You may have to install the libraries first, if you haven’t done so.



# Let’s rename the columns

# Let’s see if the columns renamed well
# What is the maximum age of the adult?
# How much data is missing?

# How many rows do we have?
# 32561 rows, 15 columns

# There are lots of different ways to deal with missing data
# That would be a session in itself!
# For demo purposes, we are simply going to replace question marks, and remove rows which have anything missing.

adult.data$workclass <- as.factor(gsub(“[?]”, NA, adult.data$workclass))
adult.data$education <- as.factor(gsub(“[?]”, NA, adult.data$education))
adult.data$marital.status <- as.factor(gsub(“[?]”, NA, adult.data$marital.status))
adult.data$occupation <- as.factor(gsub(“[?]”, NA, adult.data$occupation))
adult.data$relationship <- as.factor(gsub(“[?]”, NA, adult.data$relationship))
adult.data$race <- as.factor(gsub(“[?]”, NA, adult.data$race))
adult.data$sex <- as.factor(gsub(“[?]”, NA, adult.data$sex, fixed = TRUE))
adult.data$country <- as.factor(gsub(“[?]”, NA, adult.data$country))

is.na(adult.data) = adult.data==’?’
is.na(adult.data) = adult.data==’ ?’
adult.tidydata = na.omit(adult.data)

# Let’s check out our new data set, called adult.tidydata

# How many rows do we have?
# 32561 rows, 15 columns

# Let’s visualise the data
boxplot(adult.tidydata$education.num~adult.tidydata$earning_level,outline=F,xlab=”Income Level”,ylab=”Education Level”,main=”Income Vs Education”)

for (i in 1:ncol(adult.tidydata)-2) {
if (is.factor(adult.tidydata[,i])){
pl =ggplot(adult.tidydata,aes_string(colnames(adult.tidydata)[i],fill=”earning_level”))+geom_bar(position=”dodge”) + theme(axis.text.x=element_text(angle=75))


plot <- ggplot(data = adult.tidydata, aes(x = hours.per.week, y = education.num,
colour = hours.per.week))
plot <- plot + geom_point(alpha = 1/10)
plot <- plot + ggtitle(“Hours per Week vs Level of Education”)
plot <- plot + stat_smooth(method = “lm”, se = FALSE, colour = “red”, size = 1)
plot <- plot + xlab(“Education Level”) + ylab(“Hours per Week worked”)
plot <- plot + theme(legend.position = “none”)

That’s all for now! More later.

Jen xx

Jen’s Diary: Why are PASS doing Business Analytics at all?

As always, I don’t speak for PASS. This is a braindump from the heart. I realise that we haven’t communicated about BA as much as some members might like. It’s a hard balance – I don’t want to spam people, and I don’t want to get it too light, either. If you want to sign up for PASS BA news, here’s the link. So I have to apologise here, and hold my hands up for that one. I’ll endeavour to ensure we have a better BA communications plan in place, and i’m meeting the team on Friday to discuss how we can make that happen.

In the meantime, I’d like to blog about BA today. How did we get here, and where are we going? Why are PASS interested in Business Analytics at all? To answer this question, let’s look at the history of Business Intelligence, what Business Analytics means, and how PASS can be part of the story. Let’s start with the history lesson. What are the stages of Business Intelligence?

First generation Business Intelligence – this was the world of corporate Business Intelligence. You’ll know this by the phrase ‘the single source of truth’. This was a very technical discipline, focused on the data warehouse. It was dominated by Kimball methodology, or Imon methodology, dependent on the business requirement. However, the business got lost in all this somewhere, and they reverted to the default position of using Excel as a tool to work with Excel exports, and subverting the IT departments by storing data in email. Microsoft did – and still do – cater for the first generation of business intelligence. It has diversified into new cloud products, of course, but SQL Server still rocks. You’ll have seen that Gartner identified SQL Server as the number one RDBMS for 2015. Kudos to the team! For an overview, the Computer Weekly article is interesting.

Second generation Business Intelligence – the industry pivoted to bring the Business back into Business Intelligence. You’ll know this by the phrase ‘self-service business intelligence’. Here, the business user was serviced with clean data sources that they could mash and merge together, and they were empowered to connect to these sources. In the Microsoft sphere, this involved a proliferation of tabular models, PowerPivot as well as continued use of analysis services multidimensional models. As before, Excel remained the default position for working with data. PASS Summit 2015 has a lot of content in both of these areas.

So far, so good. PASS serves a community need by offering high quality, community education on all of these technologies. Sorted, right?

Wrong. The world of data keeps moving. Let’s look at the projected growth of Big Data by Forbes.

Well, the world of business intelligence isn’t over yet; we now have business analytics on the horizon and the world of data is changing fast. We need to keep up! But what do we do with all this data? This is the realm of Business Analytics, and why is it different from BI? The value of business analytics lies in its ability to deliver better outcomes. It’s a different perspective. Note from our first generation and our second generation BI times, technology was at the forefront of the discussion. In business analytics, we talk about organizational change, enabled by technology. In this sphere, we have to quantify and communicate value as the outcome, not the technology as a means to get there. So what comes next?

Third generation of business intelligence – self-service analytics. Data visualisation software has been at the forefront of second generation Business Intelligence, and it has taken a priority. Here, the position is taken that businesses will understand that they need data visualisation technologies as well as analytical tools, to use the data for different purposes.

How is Business Analytics an extension of Business Intelligence? Let’s look at some basic business questions, and see how they fall as BI or BA. Images belong to Gartner so all kudos and copyright to the team over there.

What happened?

If the promise of business intelligence is to be believed, then we have our clean data sources, and we can describe the current state of the business. Gartner call this descriptive analytics, and it answers the question: What happened? This level is our bread-and-butter business intelligence, with an emphasis on the time frame until this current point in time.

Why did it happen?

We can also understand, to a degree, why we are where we are. This is called diagnostic analytics, and it can help pinpoint issues in the organisation. Business Intelligence is a great domain for understanding the organisation until this point in time. However, it’s a rearview impressio of the data. What happens next? Now, we start to get into the remit of Business Analytics:

What will happen?

Businesses want to know what will happen next. Gartner call this predictive analytics, and this perception occurs when we want to try and look for predictive patterns in the data. Once we understand what will happen next, what is the next question?

How can we make this happen?

This is the power of prescriptive analytics; it tells us what we should do, and it is the holy grail of analytics. It uses business intelligence data in order to understand the right path to take, and it builds on the other types of analytics.

Business Intelligence and Business Analytics are a continuum. Analytics is focused more on a forward motion of the data, and a focus on value. People talk about ROI, TCO, making good business decisions based on strong data. First generation and second generation are not going away. A cursory look around a lot of organisations will tell you that. The Third Generation, however, is where organisations start to struggle a bit. PASS can help folks navigate their way towards this new generation of data in the 21st century.

How do we measure value? It is not just about storing the data, protecting it and securing it. These DBA functions are extremely valuable and the business would not function without them – full stop.  So how do we take this data and use it as a way of moving the organisation? We can work with the existing data to improve it; understand and produce the right measures of return, profiling, or other benefits such as team work. Further, analytics is multi-disciplinary. It straddles the organisation, and it has side effects that you can’t see, immediately. This is ‘long term vision’ not ‘operational, reactive, here-and-now’. Analytics can effect change within the organisation, as the process of doing analytics itself means that the organization solves a business problem, which it then seeks to re-apply across different silos within the organization.

SQL Server, on the other hand, is a technology. It is an on-premise relational database technology, which is aimed at a very specific task. This is a different, technologically based perspective. The perspectives in data are changing, as this Gartner illustration taken from here shows:

Why do we need a separate event? We need to meet different people’s attitudes towards data. DBAs have a great attitude; protect, cherish, secure data. BAs also have a great attitude: use, mix, apply learnings from data. You could see BA as a ‘special interest group’ which offers people a different choice. There may not be enough of this material for them at PASS Summit, so they get their own event. If someone wants to go ahead and have a PASS SQLSaturday event which is ‘special interest’ and focuses solely on, say, performance or disaster recovery, for example, then I don’t personally have a problem with that.  I’d let them rock on with it. It might bring in new members, and it offers a more niche offering to people who may or may not attend PASS because they don’t feel that there’s enough specialised, in depth, hard-core down-to-the-metal disaster recovery material in there for them. Business Analytics is the same, by analogy. Hundreds and hundreds of people attended my 3 hour session on R last year; so there is an interest. I see the BA event as a ‘little sister’ to the PASS ‘big brother’ – related, but not quite the same.

Why Analytics in particular? It’s about PASS growth. To grow, it can be painful, and you take a risk. However, I want to be sure that PASS is still growing to meet future needs of the members, as well as attracting new members to the fold However, the feetfall we see at PASS BA, plus our industry-recognised expert speakers, tell us that we are growing in the right direction. Let’s take a look at our keynote speaker, Jer Thorpe, has done work with NASA, the MOMA in New York, he was Data artist in residence at the New York Times and he’s now set up. The Office for Creative Research & adjunct professor at ITP. Last year, we had Mico Yuk, who is author of Dataviz for Dummies, as well as heading up her own consultancy team over at BI Brainz. They are industry experts in their own right, and I’m delighted to add them as part of our growing PASS family who love data.

The PASS BA event also addresses the issue of new and emerging data leaders. How do you help drive your organisation towards becoming a data-oriented organisation? This means that you talk a new language; we talk about new criteria for measuring value, working out return on investment, cross-department communication, and communication of ideas, conclusions to people throughout the organisation, even at the C-level executives. PASS BA is also looking at the career trajectories of these people as well as DBA-oriented folks, and PASS BA is out there putting the ‘Professional’ aspect into the event. We have a separate track, Communicate and Lead, which is all about data leadership and professional development. A whole track – the little sister is smartly bringing the Professional back, folks, and it’s part of our hallmark.

PASS is part of this story of data in the 21st Century. The ‘little sister’ still adds value to the bigger PASS membership, and is an area of growth for the family of PASS.

Any questions, I’m at jen.stirrup@sqlpass.org or please do come to the Board Q&A and ask questions there. If you can’t make it, tweet me at jenstirrup and I’ll see if I can catch them during the Q&A.

Jen’s Diary: Top 5 free online tools for organising yourself at PASS Summit, Live! 360 or any other conference!

As always, I don’t speak officially for PASS. This is a personal braindump.

I’m presenting at PASS Summit and Live! 360 in Orlando, so self-organising is a hot topic for me!

For diary purposes, I have been doing lots of work on PASS BAC with the team, as you can imagine. I’d like to thank the PASS team here: Vicki, Angie, Anika, Teresa, Georgia and Judy to name a few. If you want ‘behind the scenes’ tweets, please follow Anika on Twitter to know more about the running of PASS. It’s no mean feat – 6 thousand or so SQL Server fans in one place! – and the team keep everyone happy.

As you know, the PASS BAC first wave of speakers has gone out. If you have any questions, please fire them at me: jen.stirrup@sqlpass.org. More on this later! I’m humbled by the amount of industry expertise that we have in our community, and everyone who submitted is simply amazing. Thank you to everyone for their faith and belief in what we are trying to achieve, and thank you to the people who have had faith in us and have bought their tickets so far without even seeing the full agenda! PASS BAC is going to be a blast again, and I hope you’ll join us. You can register here.

Ok, now onto the clickbait that you really wanted to see. Here are some handy tools which I’m using to organize my time at PASS Summit.

Evernote – I prefer it to OneNote because, when I search my browser for items, it also brings up my Evernote notes about the same topic. Neat, huh? I use it for taking notes and I ‘snip’ everything. I use the local version which syncs to my online version, and I can read the offline local version whilst I’m on the plane. How good is that? Think of it as an offline Google or Bing repository to help you to actually read the things you marked as ‘to read later’.

 Sunrise Calendar I have tried using online tools for YEARS but this is the only one that works for me. It takes my nine calendars (yes, I am that busy!) and synchronises them in one place. I can see things in different timezones (I work BST/GMT, and then onto PST). I cannot do without this calendar now. Go and take a look, and you’ll find yourself organized in no time. You’ll need this for preventing yourself from becoming quadruple booked, as I do. These guys deserve a freakin’ award. Seriously. It is owned by Microsoft but why the hell they don’t advertise this, I have no idea. It’s so simple and it does what I need it to do. Guys – deep thanks from me.

IMAG0494Watch this video on  Productivity and take the bits from it that are useful for you. This 20 minute video has helped me so much, and it’s left its mark on me.  I hope it will help you to see how you can be more productive. It has helped me, and in some ways helped to heal me of things that were hard to let go. It’s been a hard lesson and I am humble enough to admit I’m still learning it. Good luck with it. All I do, is carry around a little SQLBits notebook that I’ve had, and I brain dump into it by writing everything down. It works. Trust me. How is this free? Well, make sure you visit the sponsor gallery at PASS Summit, and see if you can score a little notebook from one of the sponsors. It may help you more than you think!

TripIt – I am travelling across timezones and I need everything in one place. I also want to see when my friends come in. TripIt is soooo good at telling me about delays before the airlines do, that I pay for TripIt Pro. You can find Tripit here ( t ¦ w )

Packpoint save me from forgetting things. This is for Android, but I’m sure you can find something else for your favourite mobile OS. Basically, you tell PackPoint where you are going, and for how long. If you are TripIt customer (see above) it syncs everything for you so can pick your trip. If you are one of these people who forgets business cards and little things like that, give it a try. You can find your packing list online as well as on your phone, and tick things off as you go.

I hope to see you there!


Jen x

Jen’s PASS Diary: I need input from you!

As always, I don’t speak officially for PASS. Here are a few notes about what I’ve been doing. However, this is an effort to be more transparent, and open about what I am actually doing. I’d also like to re-iterate how people can get in touch, and I recommend that you do get in touch!

I’m getting things ready for PASS Business Analytics Conference. I realise it must feel like a long time since we released any news about that! We are hoping to get speaker confirmations, and the regrettable rejections, out shortly. I know it sounds like a cliché, but we had a huge number of submissions for the event from all over the world. I have loved watching the event take a life of its own, as a little sister underneath the wing of its much bigger brother, PASS Summit. We have taken volunteers for the community to help us to choose sessions, and we released guidelines before the speaker call in order to help people target their abstracts better. More on this later.

So, I have a PASS Board meeting just before PASS Summit, and I need your input. Let me know your feedback, questions, and comments. I hold the EMEA seat but I am willing to take feedback about anything PASS related. To do this, please email me at Jen.Stirrup@sqlpass.org and I will take your feedback along with me. Also, if you want me to keep the commentary confidential, I am happy to respect your wish.

My handy IoT Toolkit: What businesses forget about IoT

I recently did a brief blog post for Izenda on IoT and business intelligence, and this part of my IoT series expands on some of the themes there.

The Internet of Things is a new phenomenon; that said, a simple search for ‘Internet of Things IoT’ brings back over 60 million search results in Bing. What is the Internet of Things? The Internet of Things Global Standard gives us the following definition: ‘The Internet of Things (IoT) is defined as Recommendation ITU-T Y.2060 (06/2012) as a global infrastructure for the information society, enabling advanced services by interconnecting (physical and virtual) things based on existing and evolving interoperable information and communication technologies.

Now, this definition is fine but it focuses on the ‘shiny’ aspect of IoT and, most importantly, it does not mention the data aspect of IoT. It emphasises the connectedness of the various gadgets and their interoperability. I prefer Peter Hinssen’s discussion, where he recommends that we talk about the value of the network of things. The connected devices, on their own, will fulfil their purpose. However, if you want real insights from these sources, then you need to splice the data together with other  sources in order to get insights from it.

The thing is, the Internet of Things is really the Internet of You.

We are now heading towards the Zettabyte generation thanks to the Millennial generation. For example, the World Data Bank projects that 50% will have smartphone by 2016, and 80% by 2020. We sent 7.5 trillion SMS messages in 2014. In fact, one app, WhatsApp, sent 7.2 trillion messages. And that’s just data from one app. In 2000, Kodak processed 80 billion photos processed from camera film. In 2014, 800 billion photos from smartphones were shared on social networks. And that’s just the photos that were shared.

We are the Internet of Things.

From the business perspective, how do you make use of that IoT data? The consumerization of IT means that business users are often asked to manage and cleanse data, regardless of its size and nature. Research suggests that data is growing at a rate of 40% of each year into the next decade, driven by increased online activity and usage of smart devices. (ABI, 2013). (The New York Times, 2014 ). The consumerization of data means that business users should be able to access and analyze the data comfortably. When we introduce data that comes under the umbrella o the Internet of Things, business users will need to be able to access IoT data from devices as well as other data sources, to get insights from the data.

How can we harness the IoT phenomenon to understand and serve our customers better?

The addition of data from a variety of sources, including data from devices, means that IoT has a very wide scope and opportunity. IoT can focus on the devices themselves, or the network infrastructure connecting devices, or the analytics derived from the data which comes from the network and the devices. In order to get true insights, the IoT data would be deployed in tandem with other relevant data so that the business user obtains the context. The IoT would also introduce real time data, which would be mixed with historical data.

Customer expectations are rising; and customer-focused businesses will need to put analytics at the heart of their customer services. For example, customers do not distinguish between out of date data, and inaccurate data; for them, they are the same thing. The customer landscape is changing, and it includes the ‘millennials’ who expect technology to offer an unfailing, personal experience whilst being easy to use. This expectation extrapolates to data, and customers expect organizations to have their data correct, timely and personal.

For organizations who put customers front-and-center of their roadmap, management should encourage self-reliance in business users by ensuring that they have the right tools to provide customer-centered service.  Unfortunately, business users can suffer from a split between business intelligence reporting, and the operation systems, as a result of decoupled processes and technology at the point at which they are trying to gain insights. Often, users have to move from one reporting technology to another operational system, and then back again, in order to get the information that they need. This issue can be disruptive in terms of the workflow, and it is an obstacle to insights. In terms of IoT data, business users may have to go and get data from yet another system, and that can be even more confusing.

What does IoT mean for BI? Business Intelligence has matured from the earlier centralized technology emphasis, to a more decentralized business-focused perspective which democratizes the data for the business users. With the advent of IoT technologies, issues on collecting, refining and understanding the data are exacerbated due to the introduction of a variety of structured and unstructured data sources. In the meantime, there is an increased interest in businesses to find insights in raw data. However, this remains a challenge with the introduction of IoT data from devices, creating a mélange of data that can be difficult for business users to assimilate and understand. Companies risk obtaining lower ROI in IoT projects, by focusing only on the data, rather than the insights.

How did the industry get to this point, with disjointed technology and processes, and disconnected users? How can we move forward from here, to including IoT data whilst resolving the issues of previous business intelligence iterations? To understand this unquenchable thirst for data by business users and what it means for the future, let’s start by taking stock of the history of Business Intelligence. What are users’ expectations about data and technology in general? Until recently, these expectations have been largely shaped by the technology. Let’s start with the history lesson. What are the historical stages of Business Intelligence?

The First Generation of Business Intelligence – change in the truth

First generation Business Intelligence is the world of corporate Business Intelligence, embodied by the phrase ‘the single source of truth’. This is a very technical discipline, which focused on the extract-transform-load processing of data into a data warehouse, and focused less on business user intervention. The net result is that the business seemed to be removed from Business Intelligence. In response, the users pushed for decentralization of the data, so that they could drive their own decision making using the data flexibly, and then confirm it independently in the context in which they are operating. In terms of technology, business users reverted to the default position of using Excel as a tool to work with Excel exports, and subverting the IT departments by storing data in email.

The Second Generation of Business Intelligence – change in the users

Second Generation Business Intelligence was the change was wrought by the business users, who demanded clean data sources on a strong technical apparatus that they could mash and merge together, and they were empowered to connect to these sources. In this stage, the industry pivoted to bring the Business back into Business Intelligence, and it is typified by the phrase self-service business intelligence. The intended end result is that the business has structured data sources that the users understand, and the technical teams have a robust technical structure for the architecture in place. As before, Excel remained the default position for working with data, but the business users had more power to mash data together. Self-service business intelligence was not faultless, however. Business users were still dissatisfied with the highly-centralized IT approach as they still relied on other people to give them the data that they need. This issue introduced a delay, which increased the ‘time to answer’ metric whilst simultaneously not recognizing that this feeds into the ‘time to question’ metric. It does not recognize that analytics is a journey, and users expect to ‘surf’ through data in the same way that they ‘surf’ through the Internet.

What problems does IoT introduce for businesses, and how can we resolve them?

Given that there are inefficiencies in the process of business intelligence in organizations at the moment, how is this exacerbated by the introduction of data from devices, otherwise known as the Internet of Things? IoT data introduces new issues for business users for a number of reasons. IoT devices will transmit large amounts of data at a velocity which cannot be simply reconciled with other time-based data. The velocity of the data will add in an additional complexity as business users need to understand ‘what happened when’, and how that marries with existing data sources which may even be legacy in nature. Business users will need that work to be presented to them simply. Further, IoT devices will transmit data in different formats, and this will need to be reconciled so that the actual meaningful data is married to the existing, familiar data. If the business users are moving around disparate technology in order to match the data together, then the disconnected technology presents an obstacle to understanding the data, and thereby obtaining the promised holy grail of insights.

IoT means that we can obtain a granular level of customer data which provides unique insights into customer behavior. However, it does not immediately follow that the data will bring insights on its own; interpretation and analysis will be vital in order to obtain insights. Businesses can interpret IoT as equivalent to data from devices, and it is easy to distracted by shiny gadgetry. The ‘shiny’ approach to IoT can mean that business users are ignored in the journey, thereby shearing their insights from the solution as a whole.

Helping Business Users along the IoT Journey

As internal and external expectations on data increase, the pressure on business users will increase accordingly. Business users will need help to travel along the user journey, to adapt these changes in the data landscape that include IoT data. One solution is to add a new application that will help the business users to work with the IoT data. However, adding a new application will exacerbate the existing issues that business users experience. This might be an easy option for IT, but it will add in a new complexity for the business user. The introduction of IoT data does not necessitate the introduction of new technology to analyze the data.

IoT data resides mainly in the cloud, which means that organization’s infrastructure is changing rapidly. It will need to be reconciled and understood, regardless of where it resides. Organizations can end up with a hybrid architecture of cloud and on premise solutions, and the midst of these complex and fast-moving architectures, business users are easily forgotten. The business users will need to have a seamless environment for resolving cloud and on premise systems to enable them to product the anticipated analysis and results. Business users will find it difficult to navigate the terrain between cloud and on premise data, which will aggravate existing issues in the putting together existing data sources.

Business users have a need for data to carry out a fundamental analytical activity: comparison. How does this data compare to that data? How did last year compare to this year? How did that customer compare with this customer? Answering these simple questions may mean that traditional analytical tools may not be able to cope with the new types of data that are generated by IoT technologies, because the data will be disconnected in terms of technology and process. Excel is excellent for rectangular data sources, but it is not designed for data sources where the data travels at such velocity, and in non-rectangular shapes. So, what’s next?

The Third Generation of Business Intelligence – change in the data

The Third Generation of Business Intelligence is where users work in the era of real change in data, and it is this change is wrought by changes in the data itself. The data has changed fundamentally; it now travels faster, has more shapes, and is bigger in size than ever before. Users are not finding it easy to compare data simply by farming data into Excel; they need to be empowered to tackle seemingly simple business questions, like comparison, in a way that fits with a fluid way of working whilst being empowered with data from the IoT sphere as well as more traditional sources.  By tapping into new volumes and particularly new varieties of data, organizations can ask questions about their customers and their business in a way that they have never been able to do, previously. Further, when we add IoT into the mix, there is a promise of insights from customers and their environments, which can be incredibly valuable to companies. It is not all one way, however: In this era of tech-savvy consumers, customer relationships require planning, nurturing, and constant attention.

There should be an acceptance that business users will want access to IoT sources in the same way as any other source, but these can be exasperating and non-intuitive. Simplicity is vital in the race for the business analyst and user, and their goal is to reduce time and effort in getting the insights that they want, whilst increasing efficiency and value.

So, what gets forgotten in the IoT torrent of attention, and what can we do about it?

Simply put, the business users get lost. They are already getting lost frequently with BI projects, and this will only make matters worse for IoT projects.The ones who mash data together, clean it, make decisions on it, and put it next to other data sources in order to make sense of the data – these are the ones who should be using the data.

Given all of these issues, how do we bring the users back into an IoT architecture? I was faced with this issue recently, when designing an IoT architecture which had a focus on machine learning. IoT work involved a great deal of complexity, which is neatly hidden behind the buzzwords.

The changes in data now mean that there is a clear extension of where the industry has come from, and where it is headed. So what comes next? The third generation of business intelligence: ready to go analytics using data regardless of its shape and size.

Organisations will need to focus on the third generation of Business Intelligence if they are to be successful in facilitating users to have the access to data that they need. Users will want to try and analyse the data themselv es. Fast questions need fast answers, and businesses need to move from their initial business question through to the resulting insight quickly and accurately, in a way that they are comfortable. They also need results at the velocity of the business; answers when they need them. Remembering the users is a deceptively simple requirement that presents a number of challenges.

The dislocation between IT and the business is at its apex when we look at the opposing approaches to data. IT is still seen as a bottleneck rather than an enabler. Business users perceive IT departments as a lag in a process that needs to get from question to insight quickly, and they will look for ‘good enough’ data rather than ‘right data’ in order to get it. The way forward is to make the business users’ activities simpler whilst providing a solution that the IT department are closely involved and find the solution easier to support, so that both parties feel that they own the solution.

The solution should put the focus back on the business users who not on the humans who actually deliver service, create insights, and ultimately add business value. To do this, they need to be able to search for meaning in the data, via aggregation, broadcasting and consuming information in order to add the value that is expected of them.

To summarise, these issues were at the forefront of my mind, when I was architecting an IoT solution recently. In my next post, I will explain my technical choices, based on these considerations. On my survey, it was clear that IoT needs to be taken to a further stage so that it is usable, actionable and sensible; not just data about sensors, but data that is relevant and provides insights.

If you want to talk more about the IoT issues here, or you’re interested in having me come along and speak at your event or workplace, please email me at Jen.stirrup@datarelish.com