Azure Cosmos DB for the rest of us: 5 part blog series

For Business Intelligence and Data Science professionals, we like nothing better than the excitement of new ways to store data. So there was a lot of excitement over Azure Cosmos DB when it was announced at Build 2017.

Azure Cosmos DB can be described as the ‘everything everywhere’ database. Multi-model, all kinds of consistency, and so on. And that’s what many organisations want… something that’s close to a one source of the truth – it’s a one source for the data. But does that mean it’s the right source? How can the BI or Data Science consumer understand it? They are the ones who can be closer to the sign-off authority and they can help articulate the need for it.

I was interviewed recently for TechTarget and it became clear that the language and terminology can make Azure Cosmos DB’s utility harder to understand if you are new to it. I read the announcements and I thought… what does it actually mean, in the real world to Business Intelligence, analytics professionals and Data Science spheres? Hence this digestible blog series, aimed at explaining it in plain English for the people who will be the ones to consider using it. When you read the material, it pretty much says that Azure Cosmos DB does everything. However, it won’t do anything if it isn’t understood or made relevant.

Over the five days, I’ll pick out some of the underlying technology and why it’s useful in it’s different guises. In today’s post, I’ll pick out some of the terminology and explain what it actually means. Over the next four days, I’ll talk about the different flavours of database that are contained on Azure Cosmos DB, aimed at BI, Analytics and Data Science professionals. I’ll talk about some of the pieces that you can make use of in Azure Cosmos DB such as

  • Key-Value
  • Document Databases
  • Graph
  • Columnar / Column-Oriented Databases

Hopefully, by the end of the series, you’ll be as excited by the opportunities of Azure Cosmos DB as I am. If not, that’s ok – it’s possible that the technology isn’t for you, and inaction is an action in itself.

So let’s get started. Let’s look at the Azure Cosmos DB definition, taken from Microsoft’s site:

Azure Cosmos DB is the first globally-distributed data service that lets you to elastically scale throughput and storage across any number of geographical regions while guaranteeing low latency, high availability and consistency – backed by the most comprehensive SLAs in the industry. Azure Cosmos DB is built to power today’s IoT and mobile apps, and tomorrow’s AI-hungry future.



Ok. Let’s go through that again, at normal person pace.




globally-distributed – distribution of computation close to the geographic location of the data and the users. It goes beyond interconnection of servers as in the ‘olden days’ of
legacy architectures. In this definition, the distribution of workloads within the
architectures must be visible, adjustable, and automated.

What does this mean for you?
It means you have the capacity to use the cloud facility closest to you. This is important for legal and practical reasons, such as data privacy laws in your region, for example.

elastically scale throughput – these means that computing resources can be scaled up and down easily by Azure. Azure will adapt to workload changes by provisioning and de-provisioning resources as required. If your requirement spikes for some reason, then it will rise up to meet demand. available resources match the current demand as closely as possible. Elastically scaling throughput refers to the capacity of information units being processed, and this processing does not need to be static.

What does this mean for you?
Think of your monthly reporting. Many organisations will run financial reports for the month end. This is a ‘spike’ in requirement, which you only need 12 times a year. You don’t necessarily want, or need, to buy servers and network resource specifically for this purpose; in fact, it may be overloading your existing resources. This is where Azure steps in. You could, for example, have VMs that wake up once a month, run your reports, and then go to sleep again.

elastically scale storage – your application to size the storage according to throughput and storage on demand, worldwide. Azure Cosmos DB is intricate enough that you could even scale second and minute granularities. You can accommodate unexpected spikes in your workloads, or size downwards as required. This is a change from previous architectures, where the database has often been the least scalable component in architectures. Often, the phrase  “scaling the database” means a project in itself.

What does this mean for you?
A data storage tier of an elastic application might add and remove data storage due to cost and performance requirements. For example, it could vary the number of used Virtual Machines for example – virtual machines ‘on tap’, if you will! Azure can monitor your elastic applications for you.

low latency – latency is the delay between a client request, probably a request made by you at your computer, and a cloud service provider’s response to that request.

What does this mean for you?
A data storage tier of an elastic application might add and remove data storage due to cost and performance requirements. For example, it could vary the number of used Virtual Machines for example – virtual machines ‘on tap’, if you will! Azure can monitor your elastic applications for you.

high availability – this sounds depressing but it’s very necessary. It assumes that there are points of failure at every component of a system, and that these points of failure will fail at some point. High availability is preparing that eventuality, by building in strategies for coping with for failure using automated processes to recover from it. Fault-tolerant systems designed for high availability are achievable in the cloud.

What does this mean for you?
It means keeping the lights on, and your  business running.

consistency – different entities (nodes) have their own copy of some data object, and they may not always be the same. This is a big topic and you can research further for yourself; this is tip of the iceberg – or speck of dust in the Cosmos? There are different types of consistency.

Eventual Consistency – this is the situation where conflicts can arise. However, nodes communicate their changes to each other to resolve those conflicts. In time, each node will agree upon the final value.

Strong Consistency – all nodes agree on the new or updated value. Here, all updates are visible to all clients simultaneously, which introduces a requirement for blocking in update operations.

What does this mean for you?
Let’s take the case of an online shopping basket. Your purchases may be up to date on some nodes… but not all of them. The others need to catch up in order to resolve the conflict. This may not be noticeable by you or the purchaser. This would be eventual consistency. In strong consistency, you want the data to ‘agree’ – for example, your monthly reporting. Your consistency level depends on your requirement.


How does this relate to Azure Cosmos DB?

Business value will be created in the applications and reorganizations enabled by Azure. You don’t have to worry so much about the Cloud infrastructure itself, for example, when considering tuning for throughput – Azure Cosmos DB allows you to easily increase or decrease the amount of reserved throughput available to your application. Also, since it is globally distributed, Azure Cosmos DB will replicate your data wherever your users are. For Business Intelligence and Data Science consumers, that’s incredibly useful for your users.

You can think more about your applications and workloads. Often, developers don’t want to think about database structures and they can rely on ORM tools to write SQL for them. This is really giving developers something that they do anyway; have a very forgiving place to store data.

You can choose what consistency you require. With Azure Cosmos DB, developers do not have to settle for the extreme consistency choices that I described earlier  – strong vs. eventual consistency. Instead, Azure CosmosDB offers some ‘grey’ in there by offering 5 well-defined consistency choices:


Credit: Microsoft


Consistency Levels and guarantees

Consistency Level Guarantees
Strong Linearizability
Bounded Staleness Consistent Prefix. Reads lag behind writes by k prefixes or t interval
Session Consistent Prefix. Monotonic reads, monotonic writes, read-your-writes, write-follows-reads
Consistent Prefix Updates returned are some prefix of all the updates, with no gaps
Eventual Out of order reads


As we progress through this series, we will add more to this question. But for now, over to you!

Your homework!

Here are some videos on Azure Cosmos DB for you to view. You can learn more about the research we implemented in Azure Cosmos DB by watching this video from Turing Award-winning, Microsoft Researcher, distributed systems giant and an inspiration, Dr. Leslie Lamport.

Next steps!

Tomorrow, we will talk more about key-value databases and how this is manifested in Azure Cosmos DB. Standby for more Azure Cosmos DB goodness!

A spoonful at a time: Dealing with the Imposter Syndrome with Spoon Theory

My friend Julie Holmes posted about a great article on the Imposter Syndrome recently. I’ve decided to share one aspect of how I am working to manage and improve myself professionally. I’m a work in progress, for sure, with a few sticking plasters in place.

What is Imposter Syndrome? Most people describe it as a feeling of being fraudulent; you’ll never be as good as other people think that you are. Mine is slightly different. It has occurred to me that I’m muddling along, trying new things before other people, and that’s why I run into issues.

“Don’t be the first to do something. Be second.” – David Bowie

For my version of Imposter Syndrome, I experience this as having two streams of thought. One thought talks about the things I need to do; my MIT (Most Important Tasks) list (not a ToDo list, which is like a wishlist!) – thank you Gordon Tredgold for that inspiration. The other stream is almost like a film script, and the working title is: you’ll never be good enough. Fear is writing that script, and it means that your decisions are inspired by fear rather than data or evidence. Fear is limiting, so how do you find some sort of even keel?

e0d1c789d2ac0f8d85e29d3d642b7bd1I am learning to tame my rampant inner imposter syndrome with a variation of spoon theory. If you haven’t read this article about spoon theory, I suggest you do that first. It’s by Christine Miserandino in 2003 in her essay “The Spoon Theory” on her blog But You Don’t Look Sick.
I’ve decided that I have only so many ‘spoons’ (or insert other nouns!) to give out during the day, on things that I care about. If it is worth a spoon, then I can write it down on my mind sweeper journal as part of my Bullet Journal efforts at productivity. For more ideas on that, head over to the wonderful Boho Berry‘s site.
So when I hear something imposterish – usually within about five seconds wakening up, when I start to think of all the things I need to do – I have started to ask myself if that thought is really worth a spoon or not. It’s a conscious effort to ‘undo’ the train of thought, but I’ve found that this trick is helping me to re-evaluate my train of thought before it goes charging through the day.
I’m brought down to earth enough, as it is.

Recently I got a LinkedIn invitation from someone who has never spoken to me in person. However, what they probably don’t even remember is that they wrote a personal series of criticisms about me,  in one of my presentations during my early career. Honestly, it nearly finished my speaking career because I didn’t think I could ever get back up on the stage again. I was flattened.

Speakers take feedback forms enormously seriously. I received the feedback, and contacted the event organisers to apologise profusely that they had received all of this commentary.  The writer had put a lot of effort into it, and it was about a page.

The event organisers pointed out to me that the rest of my comments had been excellent, I was well above average and they would be pleased to received submissions in the future. I was hugely relieved. There were two takeaway points here: I hadn’t noted the good feedback I had received, just the scorching paragraphs which were not constructive. Secondly, it showed me that, no matter how much you try, you can’t please everyone. There was nothing in the scorching feedback that I could take away and I could not do anything constructive with it at all. So, I started to grow a thicker skin, and I got back up on the stage. Initially, I had kept the feedback in a Word document on my desktop for six months – just to remind me, how close I came to nearly giving everything up. I didn’t accept the LinkedIn invite, because I don’t know them, and I didn’t want to open a door of communication to invite further comments – I’d seen enough already.

I did read the feedback again and I realised that I give myself enough criticism, without absorbing other people’s as well. It was time to grow more skin, and then I grasped onto the spoon idea.
If you think I’m crazy, that’s ok 🙂 I’ve been greatly inspired by Jim Carrey, recently, of all people – his thoughts on motivation, and his life experiences are shared with such humility, honesty and love for the human race that I’ve seen behind the persona. Jim Carrey talks about finding peace and letting go of our inner critic; and if you can watch his insightful YouTube videos, you’ll take a lot from them, I’m sure. I strongly suspect that he will never see this, but if he does, thank you, dear man, from this playful heart.
But it is helping me to be more confident and spend my ‘mental space’ more productively on the things that deserve it.

I hope that helps someone – if you have Imposter Syndrome, grab yourself some mental spoons and spoon your way through it. It won’t go away, but it will help you to be easier on yourself.


Discussing Why not How – Digital Transformation for the CEO at MS Data Insights Summit

I’m delighted that I have been selected to speak at Microsoft Data Insights Summit in Seattle. The topic is Digital Transformation with Power BI for the CEO.

In my session, we will look at essential metrics to measure the business health, and the key metrics that C-level executives find crucial to understand the business – and to look at the Why.

The phrase ‘what gets measured, gets managed‘ is attributed to Peter Drucker. However, it has been used elsewhere e.g. Gordon Bethune, the former CEO of Continental Airlines used it in his 1998 book. It is sometimes stated as if you can’t measure it, you can’t improve it, sometimes attributed to Peter Drucker or Lord Kelvin. (sidenote: it is a contraction of Lord Kelvin’s article, and you can find the whole piece here. As a graduate of the University of Glasgow, I’m honoured to mention him here.)


It is usually taken to mean that you need to measure things to improve them. I think it is more subtle that that; I think that it means ‘you measure things that you care about‘.

How does the business know what they care about? It’s all about the Why.

If you were to create a NASA Voyager Golden Record to indicate what numbers describe your business, what would they be?

NASA Voyager Golden Record

NASA/JPL Voyager ‘Golden Record’

By NASA/JPL – The Sounds of Earth Record Cover, Public Domain, Link

What’s different about this session? There are plenty of ‘how to’ guides on Power BI on the internet – in fact, the community has produced a ton of insightful and interesting ways to use Power BI. There is a lot of literature on beginner guides, custom visuals, reporting and so on.

What’s different about the Digital Transformation with Power BI for the CEO session is that we will discuss the Why as well as the How. I’m going to help you to create your Golden Record to describe your business, using Power BI.

Anyone can talk about How – how to write a report, how to create custom visuals and so on.

Executives don’t listen to the How. They want to know Why. They hire people to know how. They aren’t listening to the ‘how’ because it probably isn’t the best use of their time. They hire people to know how to do ‘how’ and ‘what’.

Good businesses know what they want to do and how they do it.

Great businesses move past that. They ask the more important question – WHY?

Why it is useful to them?

Why it will change their business?

Why will Power BI help to make their businesses more inventive, pioneering and successful than others?

Why are they able to repeat their success again and again? What metrics really matter in a business?

Inspirational leaders such as Steve Jobs and Martin Luther King started with Why. Listening is a crucial skill. I have started to ask customers what has drawn them to me – why did I win the business? I need constructive feedback so I can continue to win in the future, and learn from my blind spots where I don’t do so well. If you are a consultant, you should try it. It’s humbling, of course, but it means I learn from their perspectives. Since I work for myself, I don’t get peer reviews. When I asked one customer, she thought very carefully and said that they had had a consultant turn up for the day, and he was pretty arrogant, and he leaned heavily on his technical prowess. As she watched him walk out of the door, he has a spring in his step, which told her that he believed that he had got the work. That in itself, she said, told me that he had completely misread them all. He had been so fixated on bludgeoning them with his technical prowess that he wasn’t really listening. They didn’t get back to him – which is a pity, because he would not have learned anything, really. On the other hand, she said, if I had listened and understood, ‘left the ego at the front door’ and she felt that I would embed well with their teams and I was unlikely to inflame existing politics within different teams, whereas the first consultant would simply exacerbate existing divisions. The ‘slogan’ she gave to me (at my request) was was ‘making an impact’ – she felt I would make an impact – and I’ve adopted that as my mental mantra ever since. You know who you are – and thank you for all your faith in me!

Back to the story. Was he more technical than me? Depends how you measure it; I don’t know. However, you can be the most technical person in the world but if people are walking away from you, then you won’t deliver anything. It’s more than just How.

There will be plenty of sessions that tell you ‘how’ and that’s fine – they aren’t necessarily speaking to the C-level suite audience. Jen’s session will help to give you that language, the language of Why, so that you can communicate past the How and onto the Why.

You might give the Executive a brilliant expose on the technical innards of Power BI – but that doesn’t really give them what they need. It doesn’t show that you really understand their business and what they are trying to achieve; fundamentally that means you can’t be confident that you are going to help them to get there.

A C-suite person will know how best to to drive their business and attach a dollar value to their time – and that probably isn’t spending time on writing reports using any technology. That’s where the BI professional comes in – to free the C-level suite from writing their own reports.

Focusing on the Why is what will drive the business forward. Then, it will drive focus, accountability, simplicity and transparency – all essential ingredients for driving the organisation to success.

See you there! Let’s help you to help your organisation to create your organisation’s Golden Record to describe your business, using Power BI.

Doing the Do: the best resource for learning new Business Intelligence and Data Science technology

As a consultant, I get parachuted into difficult problems every day. Often, I figure it out because I have to, and I want to. Usually, nobody else can do it other than me – they are all keeping the fires lit. I get to do the thorny problems that get left burning quietly. I love the challenge of these successes!

How do you get started? The online and offline courses, books, MOOCs, papers, blogs and the forums help, of course. I regularly use several resources for learning but my number one source of learning is:

Doing the ‘do’ – working on practical projects, professional or private

Nothing beats hands-on experience. 

How do you get on the project ladder? Without experience, you can’t get started. So you end up in this difficult situation where you can’t get started, without experience.

Volunteer your time in the workplace – or out of it. It could be a professional project or your ‘data science citizen’ project that you care about. Your boss wants her data? Define the business need, and identify what she actually wants. If it helps, prototype to elicit the real need. Volunteer to try and get the data for her. Take a sample and just get started with descriptive statistics. Look at the simple things first.

Not sure of the business question? Try the AzureML Cheat Sheet for help.


Working with dat means that you will be challenged with real situations and you will read and learn more, because you have to do it in order to deliver.

In my latest AzureML course with Opsgility, I take this practical, business-centred approach for AzureML. I show you how to take data, difficult business questions and practical problems, and I show you how to create a successful outcome; even if that outcome is a failed model, it still makes you revise the fundamental business question. It’s a safe environment to get experience.

So, if this is you – what’s the sequence? There are a few sequences or frameworks to try:

  • TDSP (Microsoft)
  • KDD

The ‘headline’ of each framework is given below, as a reference point, so you can see for yourself that they are very different. The main thing is to simply get started.

Team Data Science Process (Microsoft)











It’s important not to get too wrapped up on comparing models; this could be analysis paralysis, and that’s not going to help.

I’d suggest you start with the TDSP because of the fine resources, and take it from there.

I’d be interested in your approaches, so please do leave comments below.

Good  luck!

See you at Techorama?


Why should you go to Techorama?

Techorama is a yearly international technology conference which takes place at Metropolis Antwerp. We welcome about 1500 attendees, a healthy mix between developers, IT Professionals, Data Professionals and SharePoint professionals.

I’m delighted to announce I’m speaking, and I’d like to take this opportunity to thank the Techorama team for all of their hard work and effort in putting on a great show.


First off, there will be a keynote by Scott Guthrie, EVP of Cloud + Enterprise, Microsoft Corporation – now this is BIG NEWS.

Scott Guthrie, EVP of Cloud + Enterprise, Microsoft Corporation will be keynoting at Techorama 2017 (May 23). In his keynote, “Azure, The Intelligent Cloud”, Scott will open the event with a strategic vision on the Microsoft cloud.

Scott Guthrie will also give another breakout session on May 23 which will be a Q & A session. Come with your questions!




The event itself will be top-notch content for Developers, IT Professionals, Data Professionals and SharePoint Professionals: 11 parallel breakout sessions with top speakers from all over the world: experts in their field, offering meaningful networking opportunities with partners and like-minded people

There will also be a unique conference experience in a movie theatre with lots of surprises!

What will I be talking about? You can find out more here at my dedicated Techorama page.

Data Visualisation Lies and How to Spot them

During the acrimonious US election, both sides used a combination of cherry-picked polls and misleading data visualization to paint different pictures with data. In this session, we will use a range of Microsoft Power BI and SSRS technologies in order to examine how people can mislead with data and how to fix it. We will also look at best practices with data visualisation. We will examine the data with Microsoft SSRS and Power BI so that you can see the differences and similarities in these reporting tools when selecting your own Data Visualisation toolkit.

Whether you are a Trump supporter, a Clinton supporter or you don’t really care, join this session to spot data lies better in order to make up your own mind.

We hope to welcome you at Techorama 2017!