Azure Cosmos DB for the rest of us: 5 part blog series

For Business Intelligence and Data Science professionals, we like nothing better than the excitement of new ways to store data. So there was a lot of excitement over Azure Cosmos DB when it was announced at Build 2017.

Azure Cosmos DB can be described as the ‘everything everywhere’ database. Multi-model, all kinds of consistency, and so on. And that’s what many organisations want… something that’s close to a one source of the truth – it’s a one source for the data. But does that mean it’s the right source? How can the BI or Data Science consumer understand it? They are the ones who can be closer to the sign-off authority and they can help articulate the need for it.

I was interviewed recently for TechTarget and it became clear that the language and terminology can make Azure Cosmos DB’s utility harder to understand if you are new to it. I read the announcements and I thought… what does it actually mean, in the real world to Business Intelligence, analytics professionals and Data Science spheres? Hence this digestible blog series, aimed at explaining it in plain English for the people who will be the ones to consider using it. When you read the material, it pretty much says that Azure Cosmos DB does everything. However, it won’t do anything if it isn’t understood or made relevant.

Over the five days, I’ll pick out some of the underlying technology and why it’s useful in it’s different guises. In today’s post, I’ll pick out some of the terminology and explain what it actually means. Over the next four days, I’ll talk about the different flavours of database that are contained on Azure Cosmos DB, aimed at BI, Analytics and Data Science professionals. I’ll talk about some of the pieces that you can make use of in Azure Cosmos DB such as

  • Key-Value
  • Document Databases
  • Graph
  • Columnar / Column-Oriented Databases

Hopefully, by the end of the series, you’ll be as excited by the opportunities of Azure Cosmos DB as I am. If not, that’s ok – it’s possible that the technology isn’t for you, and inaction is an action in itself.

So let’s get started. Let’s look at the Azure Cosmos DB definition, taken from Microsoft’s site:

Azure Cosmos DB is the first globally-distributed data service that lets you to elastically scale throughput and storage across any number of geographical regions while guaranteeing low latency, high availability and consistency – backed by the most comprehensive SLAs in the industry. Azure Cosmos DB is built to power today’s IoT and mobile apps, and tomorrow’s AI-hungry future.



Ok. Let’s go through that again, at normal person pace.




globally-distributed – distribution of computation close to the geographic location of the data and the users. It goes beyond interconnection of servers as in the ‘olden days’ of
legacy architectures. In this definition, the distribution of workloads within the
architectures must be visible, adjustable, and automated.

What does this mean for you?
It means you have the capacity to use the cloud facility closest to you. This is important for legal and practical reasons, such as data privacy laws in your region, for example.

elastically scale throughput – these means that computing resources can be scaled up and down easily by Azure. Azure will adapt to workload changes by provisioning and de-provisioning resources as required. If your requirement spikes for some reason, then it will rise up to meet demand. available resources match the current demand as closely as possible. Elastically scaling throughput refers to the capacity of information units being processed, and this processing does not need to be static.

What does this mean for you?
Think of your monthly reporting. Many organisations will run financial reports for the month end. This is a ‘spike’ in requirement, which you only need 12 times a year. You don’t necessarily want, or need, to buy servers and network resource specifically for this purpose; in fact, it may be overloading your existing resources. This is where Azure steps in. You could, for example, have VMs that wake up once a month, run your reports, and then go to sleep again.

elastically scale storage – your application to size the storage according to throughput and storage on demand, worldwide. Azure Cosmos DB is intricate enough that you could even scale second and minute granularities. You can accommodate unexpected spikes in your workloads, or size downwards as required. This is a change from previous architectures, where the database has often been the least scalable component in architectures. Often, the phrase  “scaling the database” means a project in itself.

What does this mean for you?
A data storage tier of an elastic application might add and remove data storage due to cost and performance requirements. For example, it could vary the number of used Virtual Machines for example – virtual machines ‘on tap’, if you will! Azure can monitor your elastic applications for you.

low latency – latency is the delay between a client request, probably a request made by you at your computer, and a cloud service provider’s response to that request.

What does this mean for you?
A data storage tier of an elastic application might add and remove data storage due to cost and performance requirements. For example, it could vary the number of used Virtual Machines for example – virtual machines ‘on tap’, if you will! Azure can monitor your elastic applications for you.

high availability – this sounds depressing but it’s very necessary. It assumes that there are points of failure at every component of a system, and that these points of failure will fail at some point. High availability is preparing that eventuality, by building in strategies for coping with for failure using automated processes to recover from it. Fault-tolerant systems designed for high availability are achievable in the cloud.

What does this mean for you?
It means keeping the lights on, and your  business running.

consistency – different entities (nodes) have their own copy of some data object, and they may not always be the same. This is a big topic and you can research further for yourself; this is tip of the iceberg – or speck of dust in the Cosmos? There are different types of consistency.

Eventual Consistency – this is the situation where conflicts can arise. However, nodes communicate their changes to each other to resolve those conflicts. In time, each node will agree upon the final value.

Strong Consistency – all nodes agree on the new or updated value. Here, all updates are visible to all clients simultaneously, which introduces a requirement for blocking in update operations.

What does this mean for you?
Let’s take the case of an online shopping basket. Your purchases may be up to date on some nodes… but not all of them. The others need to catch up in order to resolve the conflict. This may not be noticeable by you or the purchaser. This would be eventual consistency. In strong consistency, you want the data to ‘agree’ – for example, your monthly reporting. Your consistency level depends on your requirement.


How does this relate to Azure Cosmos DB?

Business value will be created in the applications and reorganizations enabled by Azure. You don’t have to worry so much about the Cloud infrastructure itself, for example, when considering tuning for throughput – Azure Cosmos DB allows you to easily increase or decrease the amount of reserved throughput available to your application. Also, since it is globally distributed, Azure Cosmos DB will replicate your data wherever your users are. For Business Intelligence and Data Science consumers, that’s incredibly useful for your users.

You can think more about your applications and workloads. Often, developers don’t want to think about database structures and they can rely on ORM tools to write SQL for them. This is really giving developers something that they do anyway; have a very forgiving place to store data.

You can choose what consistency you require. With Azure Cosmos DB, developers do not have to settle for the extreme consistency choices that I described earlier  – strong vs. eventual consistency. Instead, Azure CosmosDB offers some ‘grey’ in there by offering 5 well-defined consistency choices:


Credit: Microsoft


Consistency Levels and guarantees

Consistency Level Guarantees
Strong Linearizability
Bounded Staleness Consistent Prefix. Reads lag behind writes by k prefixes or t interval
Session Consistent Prefix. Monotonic reads, monotonic writes, read-your-writes, write-follows-reads
Consistent Prefix Updates returned are some prefix of all the updates, with no gaps
Eventual Out of order reads


As we progress through this series, we will add more to this question. But for now, over to you!

Your homework!

Here are some videos on Azure Cosmos DB for you to view. You can learn more about the research we implemented in Azure Cosmos DB by watching this video from Turing Award-winning, Microsoft Researcher, distributed systems giant and an inspiration, Dr. Leslie Lamport.

Next steps!

Tomorrow, we will talk more about key-value databases and how this is manifested in Azure Cosmos DB. Standby for more Azure Cosmos DB goodness!

A spoonful at a time: Dealing with the Imposter Syndrome with Spoon Theory

My friend Julie Holmes posted about a great article on the Imposter Syndrome recently. I’ve decided to share one aspect of how I am working to manage and improve myself professionally. I’m a work in progress, for sure, with a few sticking plasters in place.

What is Imposter Syndrome? Most people describe it as a feeling of being fraudulent; you’ll never be as good as other people think that you are. Mine is slightly different. It has occurred to me that I’m muddling along, trying new things before other people, and that’s why I run into issues.

“Don’t be the first to do something. Be second.” – David Bowie

For my version of Imposter Syndrome, I experience this as having two streams of thought. One thought talks about the things I need to do; my MIT (Most Important Tasks) list (not a ToDo list, which is like a wishlist!) – thank you Gordon Tredgold for that inspiration. The other stream is almost like a film script, and the working title is: you’ll never be good enough. Fear is writing that script, and it means that your decisions are inspired by fear rather than data or evidence. Fear is limiting, so how do you find some sort of even keel?

e0d1c789d2ac0f8d85e29d3d642b7bd1I am learning to tame my rampant inner imposter syndrome with a variation of spoon theory. If you haven’t read this article about spoon theory, I suggest you do that first. It’s by Christine Miserandino in 2003 in her essay “The Spoon Theory” on her blog But You Don’t Look Sick.
I’ve decided that I have only so many ‘spoons’ (or insert other nouns!) to give out during the day, on things that I care about. If it is worth a spoon, then I can write it down on my mind sweeper journal as part of my Bullet Journal efforts at productivity. For more ideas on that, head over to the wonderful Boho Berry‘s site.
So when I hear something imposterish – usually within about five seconds wakening up, when I start to think of all the things I need to do – I have started to ask myself if that thought is really worth a spoon or not. It’s a conscious effort to ‘undo’ the train of thought, but I’ve found that this trick is helping me to re-evaluate my train of thought before it goes charging through the day.
I’m brought down to earth enough, as it is.

Recently I got a LinkedIn invitation from someone who has never spoken to me in person. However, what they probably don’t even remember is that they wrote a personal series of criticisms about me,  in one of my presentations during my early career. Honestly, it nearly finished my speaking career because I didn’t think I could ever get back up on the stage again. I was flattened.

Speakers take feedback forms enormously seriously. I received the feedback, and contacted the event organisers to apologise profusely that they had received all of this commentary.  The writer had put a lot of effort into it, and it was about a page.

The event organisers pointed out to me that the rest of my comments had been excellent, I was well above average and they would be pleased to received submissions in the future. I was hugely relieved. There were two takeaway points here: I hadn’t noted the good feedback I had received, just the scorching paragraphs which were not constructive. Secondly, it showed me that, no matter how much you try, you can’t please everyone. There was nothing in the scorching feedback that I could take away and I could not do anything constructive with it at all. So, I started to grow a thicker skin, and I got back up on the stage. Initially, I had kept the feedback in a Word document on my desktop for six months – just to remind me, how close I came to nearly giving everything up. I didn’t accept the LinkedIn invite, because I don’t know them, and I didn’t want to open a door of communication to invite further comments – I’d seen enough already.

I did read the feedback again and I realised that I give myself enough criticism, without absorbing other people’s as well. It was time to grow more skin, and then I grasped onto the spoon idea.
If you think I’m crazy, that’s ok 🙂 I’ve been greatly inspired by Jim Carrey, recently, of all people – his thoughts on motivation, and his life experiences are shared with such humility, honesty and love for the human race that I’ve seen behind the persona. Jim Carrey talks about finding peace and letting go of our inner critic; and if you can watch his insightful YouTube videos, you’ll take a lot from them, I’m sure. I strongly suspect that he will never see this, but if he does, thank you, dear man, from this playful heart.
But it is helping me to be more confident and spend my ‘mental space’ more productively on the things that deserve it.

I hope that helps someone – if you have Imposter Syndrome, grab yourself some mental spoons and spoon your way through it. It won’t go away, but it will help you to be easier on yourself.


Discussing Why not How – Digital Transformation for the CEO at MS Data Insights Summit

I’m delighted that I have been selected to speak at Microsoft Data Insights Summit in Seattle. The topic is Digital Transformation with Power BI for the CEO.

In my session, we will look at essential metrics to measure the business health, and the key metrics that C-level executives find crucial to understand the business – and to look at the Why.

The phrase ‘what gets measured, gets managed‘ is attributed to Peter Drucker. However, it has been used elsewhere e.g. Gordon Bethune, the former CEO of Continental Airlines used it in his 1998 book. It is sometimes stated as if you can’t measure it, you can’t improve it, sometimes attributed to Peter Drucker or Lord Kelvin. (sidenote: it is a contraction of Lord Kelvin’s article, and you can find the whole piece here. As a graduate of the University of Glasgow, I’m honoured to mention him here.)


It is usually taken to mean that you need to measure things to improve them. I think it is more subtle that that; I think that it means ‘you measure things that you care about‘.

How does the business know what they care about? It’s all about the Why.

If you were to create a NASA Voyager Golden Record to indicate what numbers describe your business, what would they be?

NASA Voyager Golden Record

NASA/JPL Voyager ‘Golden Record’

By NASA/JPL – The Sounds of Earth Record Cover, Public Domain, Link

What’s different about this session? There are plenty of ‘how to’ guides on Power BI on the internet – in fact, the community has produced a ton of insightful and interesting ways to use Power BI. There is a lot of literature on beginner guides, custom visuals, reporting and so on.

What’s different about the Digital Transformation with Power BI for the CEO session is that we will discuss the Why as well as the How. I’m going to help you to create your Golden Record to describe your business, using Power BI.

Anyone can talk about How – how to write a report, how to create custom visuals and so on.

Executives don’t listen to the How. They want to know Why. They hire people to know how. They aren’t listening to the ‘how’ because it probably isn’t the best use of their time. They hire people to know how to do ‘how’ and ‘what’.

Good businesses know what they want to do and how they do it.

Great businesses move past that. They ask the more important question – WHY?

Why it is useful to them?

Why it will change their business?

Why will Power BI help to make their businesses more inventive, pioneering and successful than others?

Why are they able to repeat their success again and again? What metrics really matter in a business?

Inspirational leaders such as Steve Jobs and Martin Luther King started with Why. Listening is a crucial skill. I have started to ask customers what has drawn them to me – why did I win the business? I need constructive feedback so I can continue to win in the future, and learn from my blind spots where I don’t do so well. If you are a consultant, you should try it. It’s humbling, of course, but it means I learn from their perspectives. Since I work for myself, I don’t get peer reviews. When I asked one customer, she thought very carefully and said that they had had a consultant turn up for the day, and he was pretty arrogant, and he leaned heavily on his technical prowess. As she watched him walk out of the door, he has a spring in his step, which told her that he believed that he had got the work. That in itself, she said, told me that he had completely misread them all. He had been so fixated on bludgeoning them with his technical prowess that he wasn’t really listening. They didn’t get back to him – which is a pity, because he would not have learned anything, really. On the other hand, she said, if I had listened and understood, ‘left the ego at the front door’ and she felt that I would embed well with their teams and I was unlikely to inflame existing politics within different teams, whereas the first consultant would simply exacerbate existing divisions. The ‘slogan’ she gave to me (at my request) was was ‘making an impact’ – she felt I would make an impact – and I’ve adopted that as my mental mantra ever since. You know who you are – and thank you for all your faith in me!

Back to the story. Was he more technical than me? Depends how you measure it; I don’t know. However, you can be the most technical person in the world but if people are walking away from you, then you won’t deliver anything. It’s more than just How.

There will be plenty of sessions that tell you ‘how’ and that’s fine – they aren’t necessarily speaking to the C-level suite audience. Jen’s session will help to give you that language, the language of Why, so that you can communicate past the How and onto the Why.

You might give the Executive a brilliant expose on the technical innards of Power BI – but that doesn’t really give them what they need. It doesn’t show that you really understand their business and what they are trying to achieve; fundamentally that means you can’t be confident that you are going to help them to get there.

A C-suite person will know how best to to drive their business and attach a dollar value to their time – and that probably isn’t spending time on writing reports using any technology. That’s where the BI professional comes in – to free the C-level suite from writing their own reports.

Focusing on the Why is what will drive the business forward. Then, it will drive focus, accountability, simplicity and transparency – all essential ingredients for driving the organisation to success.

See you there! Let’s help you to help your organisation to create your organisation’s Golden Record to describe your business, using Power BI.

Doing the Do: the best resource for learning new Business Intelligence and Data Science technology

As a consultant, I get parachuted into difficult problems every day. Often, I figure it out because I have to, and I want to. Usually, nobody else can do it other than me – they are all keeping the fires lit. I get to do the thorny problems that get left burning quietly. I love the challenge of these successes!

How do you get started? The online and offline courses, books, MOOCs, papers, blogs and the forums help, of course. I regularly use several resources for learning but my number one source of learning is:

Doing the ‘do’ – working on practical projects, professional or private

Nothing beats hands-on experience. 

How do you get on the project ladder? Without experience, you can’t get started. So you end up in this difficult situation where you can’t get started, without experience.

Volunteer your time in the workplace – or out of it. It could be a professional project or your ‘data science citizen’ project that you care about. Your boss wants her data? Define the business need, and identify what she actually wants. If it helps, prototype to elicit the real need. Volunteer to try and get the data for her. Take a sample and just get started with descriptive statistics. Look at the simple things first.

Not sure of the business question? Try the AzureML Cheat Sheet for help.


Working with dat means that you will be challenged with real situations and you will read and learn more, because you have to do it in order to deliver.

In my latest AzureML course with Opsgility, I take this practical, business-centred approach for AzureML. I show you how to take data, difficult business questions and practical problems, and I show you how to create a successful outcome; even if that outcome is a failed model, it still makes you revise the fundamental business question. It’s a safe environment to get experience.

So, if this is you – what’s the sequence? There are a few sequences or frameworks to try:

  • TDSP (Microsoft)
  • KDD

The ‘headline’ of each framework is given below, as a reference point, so you can see for yourself that they are very different. The main thing is to simply get started.

Team Data Science Process (Microsoft)











It’s important not to get too wrapped up on comparing models; this could be analysis paralysis, and that’s not going to help.

I’d suggest you start with the TDSP because of the fine resources, and take it from there.

I’d be interested in your approaches, so please do leave comments below.

Good  luck!

See you at Techorama?


Why should you go to Techorama?

Techorama is a yearly international technology conference which takes place at Metropolis Antwerp. We welcome about 1500 attendees, a healthy mix between developers, IT Professionals, Data Professionals and SharePoint professionals.

I’m delighted to announce I’m speaking, and I’d like to take this opportunity to thank the Techorama team for all of their hard work and effort in putting on a great show.


First off, there will be a keynote by Scott Guthrie, EVP of Cloud + Enterprise, Microsoft Corporation – now this is BIG NEWS.

Scott Guthrie, EVP of Cloud + Enterprise, Microsoft Corporation will be keynoting at Techorama 2017 (May 23). In his keynote, “Azure, The Intelligent Cloud”, Scott will open the event with a strategic vision on the Microsoft cloud.

Scott Guthrie will also give another breakout session on May 23 which will be a Q & A session. Come with your questions!




The event itself will be top-notch content for Developers, IT Professionals, Data Professionals and SharePoint Professionals: 11 parallel breakout sessions with top speakers from all over the world: experts in their field, offering meaningful networking opportunities with partners and like-minded people

There will also be a unique conference experience in a movie theatre with lots of surprises!

What will I be talking about? You can find out more here at my dedicated Techorama page.

Data Visualisation Lies and How to Spot them

During the acrimonious US election, both sides used a combination of cherry-picked polls and misleading data visualization to paint different pictures with data. In this session, we will use a range of Microsoft Power BI and SSRS technologies in order to examine how people can mislead with data and how to fix it. We will also look at best practices with data visualisation. We will examine the data with Microsoft SSRS and Power BI so that you can see the differences and similarities in these reporting tools when selecting your own Data Visualisation toolkit.

Whether you are a Trump supporter, a Clinton supporter or you don’t really care, join this session to spot data lies better in order to make up your own mind.

We hope to welcome you at Techorama 2017!


Five reasons to be excited about Microsoft Data Insights Summit!


I’m delighted to be speaking at Microsoft Data Summit! I’m pumped about my session, which focuses on Power BI for the CEO. I’m also super happy to be attending the Microsoft Data Summit for five top reasons (and others, but five is a nice number!). I’m excited about all of the Excel, Power BI, DAX and Data Science goodies. Here are some sample session titles:

Live Data Streaming in Power BI

Data Science for Analysts

What’s new in Excel

Embed R in Power BI

Spreadsheet Management and Compliance (It is a topic that keeps me up at night!)

Book an in-person appointment with a Microsoft expert with the online Schedule Builder. Bring your hard – or easy – questions! In itself, this is a real chance to speak to Microsoft directly and get expert, indepth  help from the team who make the software that you love.

Steven Levitt of Freakonomics is speaking and I’m delighted to hear him again. I’ve heard him present recently and he was very funny whilst also being insightful. I think you’ll enjoy his session. You’ll know him from Freakonomics.


I’m excited that James Phillips is delivering a keynote! I have had the pleasure of meeting him a few times and I am really excited about where James and the Power BI team have taken Power BI. I’m sure that there will be good things as they steam ahead, so James’ keynote is unmissable!

Alberto Cairo is presenting a keynote! Someone who always makes me sit up a bit straighter when they tweet is Alberto Cairo, and I’m delighted he’s attending. I hope I can get to meet him in person. Whether Alberto is tweeting about data visualisation, design or the world in general, it’s always insightful. I have his latest book and I hope I can ask him to sign it.


Tons of other great speakers! Now someone I haven’t seen for ages – too long in fact – is Rob Collie. Rob is President of PowerPivotPro and you simply have to hear him speak on the topic. He’s direct in explaining how things work, and you will learn from him. I’m glad to see Marco Russo is speaking and I love his sessions. In fact, at TechEd North America, I only got to see one session because I was so busy with presenting, booth duty etc… but I managed to get to see a session and I made sure it was Marco Russo and Alberto Ferrari’s session.  Chris Webb is also presenting and his sessions are always amazing. I have to credit Chris in part for where I am today, because his blog kept me sane and his generosity during sessions meant that I never felt stupid asking him questions. I’m learning too – always.

Ok, that’s five things but there are plenty more. Why not see for yourself?

Join me at the conference, June 12–13, 2017 in Seattle, WA — and be sure to sign up for your 1:1 session with a Microsoft expert.

Data Preparation in AzureML – Where and how?

messy-officeOne question that keeps popping up in  myc customer AzureML projects is ‘How do I conduct data preparation on my data?’ For example, how can we join the data, clean it, and shape it so that it is ready for analytics? Messy data is a problem for every organisation. If you don’t think it is an issue for your organisation, perhaps you haven’t looked hard enough.

To answer the question properly, we need to stand back a little, and see the problem as a part of a larger technology canvas. From the enterprise architecture perspective, that it is best to do data preparation as close to the source as possible. The reason for this is that the cleaned data would act as a good, consistent source for other systems, and you would only have to do it once. You have cleaned data that you can re-use, rather than re-do for every place where you need to use the data.

Let’s say you have a data source, and you want to expose the data in different technologies, such as Power BI, Excel and Tableau. Many organisations have a ‘cottage industry’ style of enterprise architecture, where they have different departments using different technologies. It is difficult to align data and analytics across the business, since the interpretation of the data may be implemented in a manner that is technology-specific rather than business-focused. If you take a ‘cottage industry’ approach, you would have to repeat your data preparation steps across different technologies.


When we come to AzureML, the data preparation perspective isn’t forgotten, but it isn’t a strong data preparation tool like Paxata or Datameer, for example. It’s the democratization of data for the masses, yes, and I see the value it brings to businesses. It’s meant for machine learning and data science, so you should expect to use it for those purposes. It’s not a standalone data preparation tool, although it does help you partway.

The data preparation facilities in AzureML can be found here. If you have to clean up the data in AzureML, my futurology ‘dream’ scenario for AzureML is that Microsoft have weighty data preparation as a task, like other tasks in AzureML. You could click on the task, and then have roll-your-own data preparation pop up in the browser (all browser based) provided by Microsoft or perhaps have Paxata or Datameer pop out as a service, hosted in Azure as part of your Azure portal services. Then, you would go back to AzureML, all in the browser. In the meantime, you would be better trying to follow the principles of cleaning it up close to the course.

crisp-dm_process_diagramDon’t be downhearted if AzureML isn’t giving you the data preparation that you need. Look back to the underlying data, and see what you can do. The answer might be as simple as writing a view in SQL Server. AzureML is for operations and machine learning further downstream. If you are having serious data preparation issues, then perhaps you are not ready for the modelling phase of CRISP-DM so you may want to take some time to think about those issues.