Useful Data Sources for Demos, Learning and Examples

One question that pops up from time to time is the question over sample datasets for use in self-learning, creating training materials or just for playing with data. I love this question: I learn by actively trying things out too. I love the stories in the data, and this is a great way to find the stories that bring the data to life, and offer real impact.


Since I deliver real projects with customer impact, I can’t demonstrate any real customer data during any of my presentations since my projects are confidential, so I have three approaches:
  • I use sample data and I have a signed NDA
  • I ask the customer for their data, anonymised and have a signed NDA.
  • I use their live data and have a signed NDA
If the customer elects the first option, then I use sample data from below.
To help you get started, I’ve cobbled together some pointers here, and I hope it’s useful. Please feel free to leave more ideas in the comments.


The latest edition of Entrepreneur has an insightful article on open source (frameworks vs libraries) and it has some good pointers to datasets at the bottom of the page. I’ve also pasted them here for you:
Bernard Marr has an updated list of datasets here, on Forbes. I’m not going to steal Marr’s list so I recommend that you go and head over to his page, where you’ll find sixty-plus options.

Data Source Direct Connectivity to R, Python, Ruby and Stata

R has a number of APIs that connect to public datasets e.g. the World Data Bank, which allows connectivity from R, Python, Ruby and Stata.  I used this for my recent demos at the Power BI event in the Netherlands, and it worked sweetly. SO you’d write your script to call the package, embed it in Power BI and it will go and get the data for you. I then create the chart in R, and put it into the Power BI workbook.


Quandl offers financial data, and it has a hook so that R can connect directly to it as well.


Kaggle is owned by Google, presumably so that Google can promote Tensorflow. Since people share code based on Kaggle datasets, it’s very easy to pick code, copy it, change it, and see how it works. However, this can be an issue, since you can’t be sure that the code is correct.

Final Note

If you’re teaching or presenting using this data and/or sample code, you can be pretty sure that your training delegates have got access to the Internet too so you need to be sure that you credit people properly.
I am not mostly doing training, although I do training now and again. I am a consultant first and foremost. I’m meta-tracking my time with Desktime and with Trello since I am measuring exactly how I spend my time, and training does not account for a high percentage; project delivery takes the majority of my time.
Although I’m a guest lecturer for the MBA program at the University of Hertfordshire, and I’m going to be a guest lecturer on their MSc Business Analysis and Consultancy course, I do not consider myself a trainer. I am a consultant who sometimes does training as part of a larger project. I haven’t gone down the MCT route because I regard training as part of a bigger consultancy route. I never stop learning, and I don’t expect anyone else to stop learning, either.


Nominating MVPs; growing the tech community via the MVP Program

Over the weekend, I nominated another two people for the MVP Award. It’s possible that they will not be awarded, of course; I have no influence on the process. I like to nominate people; if they aren’t put forward for an Award, then they will never get it, but I always hope that they do. Why not put someone forward? It only takes a few moments and you could change someone’s life!

What does it really mean to be an MVP?  I’ve had this privilege for the last seven years, and this is just a list of my opinions. I don’t represent the Microsoft MVP Program or anyone else here; this is just a list of my opinions.

You will not get business out of it. Again, my personal opinion: I don’t believe that the MVP Award is given out for paid activities. I could be wrong but I believed it was only due to community unpaid contributions. If you are trying to build a business on being an MVP,  or you think it will help you to build a business, then you haven’t understood the Award. It’s hard to understand where the line is drawn, however. I think of it as the ‘Father Ted’ rule, referring to the series’ running joke about Father Ted’s continued defense of himself over money, and the other characters simply do not believe him. If an activity is open to interpretation and you have to keep justifying it to other people, or to yourself, perhaps it isn’t falling on the MVP side of the fence and you’d have to speak with the MVP lead in order to get clarification and advice as well as some direction in areas where you could contribute in order to get or keep the MVP Award. The MVP Lead can keep you right.

The MVP Award is a gift that can be taken away at any time. So why not share it with other people? I believe that the true mark of a leader is that they give power away, and take care of what they leave behind. You can nominate other people at the MVP Site.

Be technically outstanding. The MVP Award can be a label that people will try to use against you. I do see this in the workplace from time to time, where people can see you as being put on a pedestal, and before they have even met you in person, they are intent on knocking you off and knocking you down. My response to this is simply that I need to be ten times as good in order to get to the same place. So, I work incredibly hard in order to make sure that happens. It can feel like you’re the Red Queen in Alice in Wonderland, running a race to keep up. The end result is that, once you’re an MVP, you can’t rest on your laurels. You have to keep running. Also, note that the MVP Award doesn’t always mean as much to other people as it does to you. I visited an organization this week, actually, who had never heard of it and couldn’t care less; they were only interested in what I could offer them. Fortunately I stood up to the test!


Forget about number one, and have a higher vision in mind. You take a risk by being ‘seen’. You have to prepare yourself for greatness, if you want to be great at anything. This involves risk, which is the risk of being seen. You have to work at balancing a need for acceptance, which can make you invisible, versus the risks of making yourself seen. Being seen can make you vulnerable, and my way through it is by being authentic. People aren’t always going to like you. This is a tough one; it’s important to rise above it when people criticize you, and it’s important not to join in criticism of other people, too. I think you have to strive to be the person and the leader that you’d like to be. Don’t get pulled down. I’ve had some really terrible things said about me, and I just ignore it. It’s not weakness or stupidity if you don’t fight back: it’s about letting people show themselves, and having faith that others will see it.  Hard as it might be to swallow, you have to strive to show people a better way. This attitude can feel very out-of-place in the world of social media where everyone’s opinions are regarded as equivalent, and it comes down to ‘who shouts the loudest’. You have to strive to be better than that. It’s one of the risks and vulnerabilities of being seen.

Share your passions for technology. So, pass the Microsoft exams, blog, produce videos, or whatever content is your passion. You’ll learn more by sharing, trust me.

I don’t know if I will make it to 8 years as an MVP. I will find out in July. I have had a blast and I am grateful to be part of it. I show it by nominating others; so why don’t you do the same thing?



Past and Future of Self-Service Business Intelligence

I was very pleased to appear on the Izenda website along with five other Business Intelligence experts, discussing the past, present and future of self-service Business Intelligence. I was delighted and honoured to appear with luminaries such as Wayne Eckerson, John Myers, Kevin Smith, Rich Ghiossi, and Ron Powell.



Self-service Business Intelligence is a much larger topic than you might think, and it’s clear that some organizations who market themselves as ‘self-service’ aren’t really meeting the criteria. I recommend that you head over to the post in order to read it all.  I’m interested in the idea of self-service analytics as well as self-service business intelligence, and I do think that will become increasingly relevant as the industry matures.

Thank you to Izenda for having me along. Please let me know what you think; I look forward to your comments.



Cloud computing as a leveler and an enabler for Diversity and Inclusion

I had the honour and pleasure of meeting a young person with autism recently who is interested in learning about Azure and wanted some advice on extending his knowledge.
It was a great reminder that we can’t always see people who have conditions such as autism. It also extends to disability, particularly those that you can’t see; examples include epilepsy or even Chronic Fatigue Syndrome.

Diversity gives us the opportunity to become more thoughtful, empathetic human beings.



I love cloud because it’s a great leveler for people who want to step into technology. It means that these personal quirks, or differences, or ranges of abilities can be sidestepped since we don’t need to all fit the brogrammer model in order to be great at cloud computing. Since we can do so many things remotely, it means that people can have flexibility to work in ways that suit them.

In my career, I couldn’t lift a piece of Cisco kit to rack it, because I was not strong enough. With cloud, it’s not a problem. The literally heavy lift-and-shift is already done. It really comes down to a willingness to learn and practice. I can also learn in a way that suits me, and that was the main topic of conversation with the autistic youth that I had the pleasure to meet.

I believe that people should be given a chance. Diversity gives us the opportunity to become more thoughtful, empathetic human beings. In this world, there is nothing wrong with wanting more of that humanness.

Data-Driven or Insights-Driven? Data Analytics vs Data Science

I had an interesting conversation with one of my customers. Through my company Data Relish, I have been leading the Data Science program for some time now, and I was using Team Data Science Process as a backbone to my leadership. I feel I’m fighting the good fight for data, and I like to involve others through the process. It’s great to watch people grow, and get real insights and digital transformation improvements based on these insights.



Data science projects are hard, though, and it’s all about expectations. In this case, my customer was curious to know why the current data science project took longer than he expected, and shouldn’t they just exclude the business understanding part of the data science journey? Couldn’t the analytics just clean themselves, or just cut out every piece of data that was a problem?

Being data-driven is all very well, but we need to be open to the insights from business expertise, too.

  When the conversation continued, it became clear that a different data organization had been involved in conversations at some point. Apparently, another organization had told my customer that they needed Data Analytics rather than Data Science, and that the two were mutually exclusive. Data Analytics would give them the insights without involving much, if any, business knowledge, effort, or time. What my customer understood from them was that they didn’t need to match data, clean it and so on; data analytics simply meant analysing columns and rows of data in order to see what relationships and patterns could be found in the data. In essence, the customer should divorce business knowledge from the data, and the data should be analyzed in isolation. The business and the data were regarded as mutually exclusive, and the business side should be silenced in order to let the data speak. Due to these conversations, the customer was concerned about the length of time of the project was taking, and wanted to go down the ‘data analytics’ route, mix up columns, skip data cleaning and matching sources, and he was absolutely certain that insights would fall out of the data. To summarise, there were a few things behind the conversation:

  • business people are concerned about the time taken to do a data science project. They are essentially misled by their experience of Excel; they believe it should be as straightforward and quick as generating a chart in Excel.

  • business people can be easily misdirected by the findings as a result of the data science process, but without being critical about the results themselves. It seems to be enough that a data science project was done; but not that it was right. The fact it is a data science project at all is somehow ‘good enough’.

  • business people can be easily swayed by the terminology. One person said that they were going into decision science, but couldn’t articulate properly what it was, in comparison to data science. That’s another blog for another day, but it’s clear that the terminology is being bandied around and people are not always understanding, defining or delineating what the terms actually mean.

  • business people can equate certainty with doing statistics; they may say that they don’t expect 100% findings, but, in practice, that can go out of the window when the project is underway.

The thing is, this isn’t the first time I’ve had this conversation. I think that being data driven is somewhat misleading, although I do admit to using the term myself; it is very hashtaggable, after all. I think a better phrase is insights driven. If we remove the business interpretation and just throw in data, we can’t be sure if the findings are reasonable. As I responded during this conversation, if we put garbage in, we get garbage out. This is a stock phrase in business intelligence and data warehousing, and it also applies to data science. There needs to be a balance; we can be open to new ideas. Our business subject matter expertise can help to shortcut the project by working with the data – not against the data. It helps to avoid the potential of going down rabbitholes because the data said so. The insights from the business can help to make the stories in the data more clear, whilst being open to new insights from the data.

In other words, data and the business should not be mutually exclusive.

How did it end? We proceeded as normal, and as per the Data Science plan I’d put in place. Fortunately, there were strong voices from the business, who wanted to be included at all stages. I think that we are getting farther, faster, as a unified team, all moving in the same direction.  We need to question. Data Science is like April Fools’ Day, every day; don’t believe everything you read. Otherwise, we will never see the wood for the trees. wood