Data Visualisation with Hadoop, Hive, Power BI and Excel 2013 – Slides from SQLPass Summit and SQLSaturday Bulgaria

I presented this session at SQLPass Summit 2013 and at SQLSaturday Bulgaria.

The topic focuses on some data visualisation theory, an overview of Big Data and finalises the Microsoft distribution of Hadoop. I will try to record the demo as part of a PASS Business Intelligence Virtual Chapter online webinar at some point, so please watch this space.

I hope you enjoy and I look forward to your feedback.

Eating the Elephant: Totally free videos showing an introduction to Visualising Big Data for Business Intelligence Professionals

Continuing my ‘eating the Elephant, one bite at a time’ series, which focuses on Microsoft Business Intelligence and Hadoop, I’ve put together a series of totally free videos, to help people who are interested in visualising Big Data using familiar tools in Microsoft. The purpose is to take data from various data sources, including SQL Server, HDInsight (Microsoft’s distro of Hadoop) and Excel, and visualise the data via PowerPivot and Excel.

Self Service Bi – and Big Data – a Business Intelligence person’s dream! Well, it is for me!

As I say in these videos, Excel is the world’s favourite software for Business Intelligence, and it must surely rank as one of the most favourite software applications of all time. Excel is used (and abused!) more than any other software I’ve seen.

I hope that you will enjoy the videos and I look forward to your feedback. You can access them on YouTube here

Please note: in the process of practising for my Big Data precon at SQLPass Summit in Charlotte on 15th October, I reused the material from a fantastic blog post by Cindy Gross of the SQLCat team and I’d like to thank Cindy and her team for writing this material.

The blog post is here and I recommend that you go through it – I just videoed it, but the material belongs to them so I’d like to make sure that they get credit for the blog post, so that’s why I’m emphasising that I’m calling it out.

Please note that this isn’t material from my actual precon – it’s simply a way of me to work my way through preparing for the precon I’m presenting jointly with Allan Mitchell (SQL Server MVP). I have simply put it in video format in order to practice my delivery, and then it struck me that people might find this useful. If so, look out for more videos in future!

I hope it helps.
Kind Regards,

Eating the elephant, one bite at a time: Loading data using Hive

In the previous ‘Eating the elephant’ blogs, we’ve talked about tables and their implementation. Now, we will look at one of the ways to get data into a table. There are different ways to do this, but here we will look only at getting data into an external Hive table using HiveQL, which is the Hive query language. Sounds similar to SQL, doesn’t it? Well, that’s because a lot of it looks and smells similar. When you look at this example here, you will see what I mean!

Here is the Hive syntax to create an external table. In this post, we will look at simple examples, and perhaps look at partitioned tables on another occasion.

First things first; you need to create an external table to hold your data. Before you do, check the contents of the folder /user/hue/UNICEF if it exists. If not, don’t worry, we have the rest of the syntax below:

Create your table

CountryName     STRING,
AvgRankPosition     FLOAT,
MaterialWellBeing     FLOAT,
HealthAndSafety     FLOAT,
EducationalWellBeing     FLOAT,
FamilyPeersRelationships     FLOAT,
BehavioursAndRisks     FLOAT,
SubjectiveWellBeing     FLOAT)
LOCATION '/user/hue/UNICEF';
Note that the directory has now been created: /user/hue/UNICEF If it was not there already. 
You may now see it in the File Explorer, and here is an example below:


The next step is to upload some data to folders. For the purpose of showing what Hive does when you load up data files, let’s place a file into a different folder, called /user/hue/IncomeInequality/ and the file will be called ‘UNICEFreportcard.csv’ The syntax to load the file statically is here:

LOAD DATA INPATH ‘/user/hue/IncomeInequality/UNICEFreportcard.csv’ OVERWRITE INTO TABLE `unicefreportcard`;

You can see this here, and if it is not clear, you can click on the link to go to Flickr to see it in more detail:


Once you have executed the query, you can check that the data is in the table. This is easy to do, using the HUE interface. Simply to go the Beeswax UI, which you can see as the yellow icon at the top left hand side of the HUE interface, and then select the ‘Tables’ menu item from the grey bar just under the green bar. You can select the UNICEFreportcard table, and then choose ‘Sample’. You should see data, as in the example screenshot below:


When the data is imported, where does it get stored for the external table?

The UNICEFreportcard file was originally in the IncomeInequality folder, but now the original UNICEFreportcard.csv file is moved from the IncomeInequality folder. Where has it gone? It has been moved to the UNICEF folder, which was specified as the location when we created the external table.

When we drop the table, the data file is not deleted. We can still see the CSV file in the UNICEF folder. This is straightforward to test; simply drop the table using the button, and then use File Explorer to go to UNICEF folder and you will see the CSV. Here it is:


To test the separation of the data and the table metadata, simply re-run the query to create the table as before. Now, when you view a sample of the data, you see the data as you did before. When you dropped the table originally, you simply removed the table metadata; the data itself was stored. The reason for this is due to the fact that this table is an External table, and Hive preserves a separation between the data and the table metadata.

If the table was an internal table, if you drop the table, the data will be deleted as well – so be careful!

To summarise, in this post,we’ve looked at one way of getting data into Hive. Our next steps will be to do something with the data, now that it is in Hive. So let’s move on to do some simple querying before we proceed to start to visualise the data.

I hope that helps!