Useful Data Sources for Demos, Learning and Examples

One question that pops up from time to time is the question over sample datasets for use in self-learning, creating training materials or just for playing with data. I love this question: I learn by actively trying things out too. I love the stories in the data, and this is a great way to find the stories that bring the data to life, and offer real impact.

narrative

Since I deliver real projects with customer impact, I can’t demonstrate any real customer data during any of my presentations since my projects are confidential, so I have three approaches:
  • I use sample data and I have a signed NDA
  • I ask the customer for their data, anonymised and have a signed NDA.
  • I use their live data and have a signed NDA
If the customer elects the first option, then I use sample data from below.
To help you get started, I’ve cobbled together some pointers here, and I hope it’s useful. Please feel free to leave more ideas in the comments.

Entrepreneur

The latest edition of Entrepreneur has an insightful article on open source (frameworks vs libraries) and it has some good pointers to datasets at the bottom of the page. https://www.entrepreneur.com/article/310965 I’ve also pasted them here for you:
Bernard Marr has an updated list of datasets here, on Forbes. I’m not going to steal Marr’s list so I recommend that you go and head over to his page, where you’ll find sixty-plus options.

Data Source Direct Connectivity to R, Python, Ruby and Stata

R has a number of APIs that connect to public datasets e.g. the World Data Bank, which allows connectivity from R, Python, Ruby and Stata.  I used this for my recent demos at the Power BI event in the Netherlands, and it worked sweetly. SO you’d write your script to call the package, embed it in Power BI and it will go and get the data for you. I then create the chart in R, and put it into the Power BI workbook.

Quandl

Quandl offers financial data, and it has a hook so that R can connect directly to it as well.

Kaggle

Kaggle is owned by Google, presumably so that Google can promote Tensorflow. Since people share code based on Kaggle datasets, it’s very easy to pick code, copy it, change it, and see how it works. However, this can be an issue, since you can’t be sure that the code is correct.

Final Note

If you’re teaching or presenting using this data and/or sample code, you can be pretty sure that your training delegates have got access to the Internet too so you need to be sure that you credit people properly.
I am not mostly doing training, although I do training now and again. I am a consultant first and foremost. I’m meta-tracking my time with Desktime and with Trello since I am measuring exactly how I spend my time, and training does not account for a high percentage; project delivery takes the majority of my time.
Although I’m a guest lecturer for the MBA program at the University of Hertfordshire, and I’m going to be a guest lecturer on their MSc Business Analysis and Consultancy course, I do not consider myself a trainer. I am a consultant who sometimes does training as part of a larger project. I haven’t gone down the MCT route because I regard training as part of a bigger consultancy route. I never stop learning, and I don’t expect anyone else to stop learning, either.

literature

See you at Techorama?

Techorama

Why should you go to Techorama?

Techorama is a yearly international technology conference which takes place at Metropolis Antwerp. We welcome about 1500 attendees, a healthy mix between developers, IT Professionals, Data Professionals and SharePoint professionals.

I’m delighted to announce I’m speaking, and I’d like to take this opportunity to thank the Techorama team for all of their hard work and effort in putting on a great show.

Guthrie_print[1]

First off, there will be a keynote by Scott Guthrie, EVP of Cloud + Enterprise, Microsoft Corporation – now this is BIG NEWS.

Scott Guthrie, EVP of Cloud + Enterprise, Microsoft Corporation will be keynoting at Techorama 2017 (May 23). In his keynote, “Azure, The Intelligent Cloud”, Scott will open the event with a strategic vision on the Microsoft cloud.

Scott Guthrie will also give another breakout session on May 23 which will be a Q & A session. Come with your questions!

 

 

 

The event itself will be top-notch content for Developers, IT Professionals, Data Professionals and SharePoint Professionals: 11 parallel breakout sessions with top speakers from all over the world: experts in their field, offering meaningful networking opportunities with partners and like-minded people

There will also be a unique conference experience in a movie theatre with lots of surprises!

What will I be talking about? You can find out more here at my dedicated Techorama page.

Data Visualisation Lies and How to Spot them

During the acrimonious US election, both sides used a combination of cherry-picked polls and misleading data visualization to paint different pictures with data. In this session, we will use a range of Microsoft Power BI and SSRS technologies in order to examine how people can mislead with data and how to fix it. We will also look at best practices with data visualisation. We will examine the data with Microsoft SSRS and Power BI so that you can see the differences and similarities in these reporting tools when selecting your own Data Visualisation toolkit.

Whether you are a Trump supporter, a Clinton supporter or you don’t really care, join this session to spot data lies better in order to make up your own mind.

We hope to welcome you at Techorama 2017!

 

PASS Business Analytics Day, Jan 11, Chicago

pass-ba-day

PASS’ first Business Analytics Day, which will be held in Chicago on January 11, 2017. You can choose one of two full-day, in-depth sessions for $595: In-Database Analytics with R and SQL Server 2016 and Mastering Power BI Solutions.

These are unique learning opportunities to get more advanced in R or data visualization with Power BI. And as with other PASS events, the goal is to allow you to walk away with real-world analytics knowledge that you can use immediately!

PASS Business Analytics Day

You have two great choices: In-Database Analytics with R and SQL Server 2016 and Mastering Power BI Solutions.

In-Database Analytics with R and SQL Server 2016

With Microsoft SQL Server 2016, data scientists can run in-database analytics using R. This is a “best of both worlds” scenario: delegate database management to SQL Server whilst you create analytics and visualisations in R and Power BI. In this session, we will cover the overall architecture of SQL R Services and go over some best practices. We will look at best practices in analytics and visualisations with a focus on R, and then we delve more in-depth into some practical common use-cases.

Speakers:
David Smith, R Community Lead at Revolution Analytics, a Microsoft Company
Seth Mottaghinejad, Data Scientist, Microsoft

Mastering Power BI Solutions

In this Power BI hands-on Workshop, you will master the “power” of Power BI. Learn to use self-service and enterprise-scale Power BI capabilities; gain valuable skills to integrate, wrangle, shape and visualize data for analysis. Beginning and intermediate level users will learn to address data and reporting challenges with advanced design techniques.

Speaker:
Paul Turley, Mentor with SolidQ, BI Architect, and Microsoft Data Platform MVP

Date: January 11, 2017

Location: Microsoft Technology Center, #200 – 200 East Randolph Drive, Chicago, IL.

We hope you’ll join us!

PASS BA Visual Data Storytelling precon session with Mico Yuk

CEO of BI Brainz | Author | Global Keynote Speaker | BI Influencer | Trainer | Blogger| BI Executive Advisor *@micoyuk*

I am super excited that Mico Yuk is joining us again at PASS BA Conference!

aaeaaqaaaaaaaasjaaaajdfkmzdiywy2lwizmtktnduxyy05zmfilwy1owyzodgyyjzmzaMico Yuk is well known in the Business Intelligence ecosystem as a community leader, BI influencer, controversial blogger, and the founder of the highly rated BI Coaching Series,
the BI Dashboard Formula (aka BIDF). Headquartered in Atlanta, GA, her team of senior coaches and consultants work with Executives to transform their BI teams to meet the challenges in the new era of BI through a series of coaching, training, and consulting services.Mico’s most recent accomplishments include being named one of the Top 50 Analytics Bloggers by SAP and being rated a #1 global keynote speaker at a number of global BI conferences.

A computer engineer by degree, she has been designing and implementing enterprise dashboards for major corporate clients since 2006 and is considered to be one of the top data visualization experts in the world.

First as a consultant and now through her company, she has helped to implement executive dashboard and reporting using the SAP BusinessObjects platform for customers such as Allstate, Pfizer, Aviva Canada, McKesson, Ryder Logistics, Digicel Jamaica, QatarGas, St. Jude Medical, Walgreens, Chiquita, LG, the US Airforce, Medtronics, SAP Global Marketing, Amtrak, Fresh Direct, Bank of America, and Nestle, to name a few.

To find out more about Mico, please visit http://micoyuk.com.

Visual Storytelling – How to Tell a “Compelling” Data Story That Matters to Your Users

This business-oriented, hands-on session will provide the foundation necessary to make your data visualizations more intelligent, actionable, and useful! Whether you are a beginner or a data visualization veteran, this session will guide you on telling more compelling stories with your data, from storyboarding fundamentals to more advanced techniques such as how to add smart context and visual cues. Attendees will learn:

• how to create a simple four-part visual storyboard on paper in minutes, not weeks
why visual storytelling is more effective than traditional reporting
• the one element 98% of data visualizations are missing and how it is negatively affecting user adoption

Format: Half-Day Classroom (Afternoon)

Register here

 

PASS BA Header Mico

 

My Kibana presentation slide deck from SQLBits

Here are the slides from my first SQLBits XIV session at the Excel in London, March 2015.

I presented with Allan Mitchell on the topic of ElasticSearch and Kibana. I did the Kibana section and talked around the topic of data visualisation, and the slides are here. I whizzed through these first, before giving a demo. I’ll post up the SQLBits video as soon as I have it.

Visualising Data with R: HTMLWidgets for R

I’m interested in analyzing and visualizing data with R, as you know. Thanks to Jen Underwood, she pointed me at this cool series of widgets, called HTMLWidgets, that allows you to do some interesting visualisations with R. I must admit I haven’t touched JavaScript for years, ever since I helped to build websites for well known entertainment sites, but I’m glad I can pick it up again. What does it let you do?

  • Use JavaScript visualization libraries at the R console, just like plots
  • Embed widgets in R Markdown documents and Shiny web applications
  • Develop new widgets using a framework that seamlessly bridges R and JavaScript

Why don’t you head on over to the site and have a look? I’ll be playing with it, and finally finishing my R blog series, as soon as I can. I’ve been busy helping to organize the PASS BA Conference, which is taking place in April 2015 in Santa Clara. Therefore, I hope you’ll excuse the radio silence. It’s on my to-do list, which never goes down, ever!