Using Excel as a data source for Tableau – Shaping your Data

The purpose of this blog is to help you to see the difference that the underlying data structure has in determining the visual resulting visual output. For me, the underlying data is all-important; if it isn’t clean or well-structured, then this needs to be resolved first. After that, even the neat pivoted and un-pivoted data structures can have a real impact on the visual display of the data.

Excel, particularly Excel 2010, has some great new data visualisations on board, and it looks as if more exciting data visualisation .NET stuff is coming out of Microsoft research, so watch that space!

In the meantime, if you are using Tableau, some neat visualisations exist for you to use. In case you didn’t know, it can be possible to maximise your analysis of Excel data structures by preparing it using the Tableau Shaper add-in. The add-in prepares the data for optimal Tableau analysis by placing the values into one column, thus ‘unpivoting’ the data. Excel, particularly Excel 2010, is very good at analysing data in row orientation. If you’ve tried to use the sparklines in Excel 2010, you will see how easy it is. On the other hand, Tableau can need a little help in viewing data in ‘row’ orientation, and this is served by the Tableau Excel add-in.

To help you, here is a dashboard which shows you the preferred fiction reading habits of Americans. The data is taken from Harris International, who, in addition to providing fascinating data for American citizens, also serve up UK data for the popular UK Channel 4 series 8 out of 10 Cats

Here is an example; in this dashboard, the top data has been ‘shaped’ and unpivoted, and the bottom section has remained in the original ‘pivoted’ format. To go to Tableau Public, click on the image to get a bigger and better interactive view: 


Here, the top data has been unpivoted into a single value column, so it is easier for Tableau to analyse and compare. Here is an example of the unpivoted data:


In the case of the bottom dashboard, Tableau sees the row data as different values, even though they both add up to percentages. Here is an example of the pivoted data:


I hope that helps you to visualise how the underlying data structures are important in determining the ultimate visual output. If your data is not clean, you need to go back and clean it, since it will become evident very quickly!

Bullet Charts in SQL Server Reporting Services 2008 R2: ‘How To’ Video

Recently someone told me that they were struggling to produce a Stephen Few inspired bullet chart using Microsoft SQL Server 2008 R2. This is perfectly understandable, because, for some reason, bullet charts are considered a ‘gauge’ in Reporting Services. This is ironic because, if you’ve read Stephen Few’s material, then you will already be aware that bullet charts were designed to be more meaningful in displaying data than gauges and pie charts.
In order to help individuals to produce bullet charts rather than gauges in SQL Server Reporting Services 2008 R2, I’ve produced a quick video which I hope will help. Before we dive into the video, let’s have a look at the end result:
Bullet Graph Video Result
The purpose of this chart is simply an example, which shows the sales data for David Hume and John Napier. The sales ‘target’ is the little vertical line that you see in both charts. So, for David, the line is set at 75%, and for John, the target is set at 70%.
The ‘actual’ sales is represented by the thick blue horizontal line. So, for David, he sold 80% possible sales. John, on the other hand, sold 100% possible sales.
The bullet chart is nice simply because it allows you to quickly compare target values with actual values, and it also allows you to quickly look down and compare ‘between’ values as well. Here, it is quite easy to compare David’s actual and target sales with John’s actual and target sales.  Bullet charts take up less room and express meaningful information, so we can compress more information into a given space than we would get from a gauge.
Normally, in a bullet chart, you may consider removing the percentages or other values along the quantitative scale. However, in order to try and be as clear as possible, I have left them in.
I hope you enjoy the video, which I’ve placed on YouTube and you may also see here:

I look forward to your comments.

Add to Technorati Favorites

Man Booker Prize Winners since 1969

As you may have seen, the Man Booker prize has been won by Howard Jacobson with The Finkler Question. If you’re a bookworm like me, and like to read all of the short-listed entries and previous Man Booker Prize winners, then you may be interested to have a look at my ‘Man Booker Prize’ dashboard, which is hosted by Tableau Public. Please click on the image if you would like to go to the website.


This dashboard may look unusual since I use a bar chart, running left to right, to represent sales and number of prizes won by publishers. This is known as a ‘column chart’ in Reporting Services. This makes use of our ‘left to right’ reading pattern in the West, in order to make the quantity easier to judge. This also facilitates comparison between the various prize winners. 

The Guardian Datablog team have collated Nielsen BookScan‘s sales figures of all 43 winners of the title since its inception in 1969. Nielsen’s data starts from 1998 onwards, so it is not possible to directly compare pre-1998 sales with post-1998 sales. To acknowledge this fact, I have split out the visualisation so that post-1998 and pre-1998 winners are displayed on different graphs. In order to give you an idea of the overall pattern, however, I have ensured that the visual axis of both column charts are exactly the same; so they both start at zero, and end at 10 million. Ensuring that the visual axis are identical is helpful in the comparison process, which is an essential part of helping people to cognitively integrate the visual display (Few, 2009)
The value of sales is double-encoded, as is my normal way of expressing values. One is the length of the bar, which is one of Colin Ware’s pre-attentive attributes. Another way of encoding the value is bar colour; the stronger and more intense the colour, the visual system assumes that the values are higher  (Few, 2009).
It is also possible to filter the information by publisher.  Try and see who is the publisher with the highest number of winners. Is this the same as the publisher who has the highest sales amount? What does this tell us? I’ll let you have fun, by interacting with the graph here

Nobel Peace Prize Winners visualisation using Tableau

It’s possible to interact directly with Tableau workbooks using Tableau Public, which is probably best described as a ‘Youtube’ for data.  This means that viewers are no longer passive consumers of data visualisations. Instead, data consumers are turned into active, interactive data consumers who can visually navigate their way, slicing intuitively through their data sets in order to make strategic decisions based on data and fact.
I’ve made use of Tableau Public here in order to try and ‘practise what I preach’ with respect to data. In other words, that data visualisations should be fluid enough for navigation, and robust enough to withstand scrutiny. Click on the image in order to go to the relevant page at Tableau Public: 

This particular image shows the sex of the Nobel Peace Prize winner as male, female or ‘N’, which represents an institution. If you go to the Tableau Public site, you can play with the worksheet and see where the male/female/institutions have earned the Nobel Peace Prize. 
As in the case of a previous blog, most of the countries had up-to-date names apart from the following:

  • Burma – is now Myanmar
  • Tibet – contentious
  • Russia – should be the ‘Russian Federation’
  • Palestine – contentious

If Tableau does not recognise the country, then it assigns it as a Null value and it appears in the ocean. This occurs with Tibet. There is a debate over whether Tibet is independent of China, and I am afraid I cannot answer that here. However, since the Dalai Lama is a peaceful man, in my opinion, I could not find it in my heart to represent him as anywhere else other than Tibet; so I’m afraid I’ve left it as a Null value in order to highlight it.
If this was Microsoft SQL Server 2008 R2 Reporting Services, we could make the most of the geographic and geometrical points stored in SQL Server 2008 R2; this would remove the obstacle around the political associations with naming countries since we would simply have an absolute point in a map. More on this on later blogs!

Tableau Public is free, so what are you waiting for? Get vizzing!

Why don’t people like bullet charts? Part 2

This is the second blog in a series which focuses on the occasional difficulties around user acceptance of bullet charts. In my experience, bullet charts are a bit ‘marmite’ – some people ‘get’ them completely, but other people struggle and do not ‘get’ them. In my own opinion, I am a bullet chart fan, since I think that they are a good way of expressing data concisely in a way that’s compatible with our current understanding of human perception. 

In order to try and help people make the mental leap towards understanding and using bullet charts, each blog will focus on a particular reason that I’ve heard from users who are not comfortable with bullet charts. I hope to balance each blog by attempting to provide a resolution to try and help users to understand bullet charts a little bit better.

In this blog, I would like to address one particular reason that people have said which means that they don’t ‘get’ bullet charts – they don’t always grasp the meaning of the linear scale used as a background to the bullet graph. In the Bullet Graph Design Spec, the linear scale is described as ‘From two to five ranges along the quantitative scale to declare the featured measure’s qualitative state (optional)’ (Few, 2010). What does this actually mean? Here’s an analogy, which might help to clarify what this means.

Let’s take the case where you would like to check the quality of your password strength. In other words, you would like to check how weak, medium or strong is your password. Microsoft provide a free online password checker, which provides a visual guide of the strength of your password. Here is an example image which denotes the quality of a weak password:

Microsoft Password Checker Weak Quality

Here is the feedback image provided when the quality of the password is considered to be fair:

Microsoft Password Checker Amber Quality

Finally, here is the visual feedback provided if the password is judged to be strong:

Microsoft Password Checker Strong Quality

The farther right along the coloured bar travels, then the password is considered to be higher quality as it proceeds from the left to the right. This is familiar, since this is also used for progress bars during software installs, for example.

The Microsoft free online password checker also double-encodes the quality of the password by using traffic light colouring i.e. red denotes danger, amber denotes fair, and green is ‘go’ or acceptable. As an example, if we wanted to ‘label’ the aforementioned password strength example with percentages, the values 0%-40% may denote a ‘red’ or weak password; 41 – 69% may indicate an ‘amber’ or ‘fair’ password, and values over 70% – 80% may specify a ‘strong’ password; values 81% or over may indicate the ‘best strength’ password.

What has this got to do with bullet charts, I hear you ask? Well, the linear scale at the background of the bullet chart also helps to denote quality in two ways:

 – The ‘left to right’ mechanism, familiar from progress bars or the above example, is also deployed in bullet charts. The qualitative state of the measured metric is considered to increase as the measure travels from the left to the right.

 – The colour of the bullet chart helps to denote quality by helping to ‘label’ the quantity. As in the aforementioned example where values can be represented by a ‘traffic light’ colour, the bullet chart can also use colour to indicate value, thereby enhancing the ‘qualitative’ understanding of the chart. Here is an example bullet chart from Reporting Services 2008 R2, which can help to illustrate this point:


We can see the values in the darker grey colour are used to denote metrics that fall into the range 0 – 60%, the mid-grey is used to identify 61% – 80%, and the lightest grey is used to denote 81% and over.  Stephen Few makes the point that colour-blind people may find the ‘traffic light’ colours confusing. To get around this issue, which affects approximately 10% of males, Few suggests that ranges of intensities of the same hue are easier for people to visually digest, with the darker colour representing ‘poor’ quality, progressing through to lighter intensity to represent ‘good’ quality.

In the last example, I’ve used darker blue to denote 0% – 60%, and lighter blue from 61 – 80%, and finally the lightest blue from 81% until 100%. The colour helps the viewer to identify the quality of the result e.g. poor, medium or high.

The comparative measure (the little vertical blue line) indicates the overall ‘success’ and could be called the ‘target’. In a real life example, this may indicate whether a particular department met a recycling target, or made a certain percentage of total projected sales, for example.

We can see that the top bullet chart slightly exceeded the comparative measure, or target, which can be found at 75% or thereabouts. We can see the performance measure of the top bullet chart (the thick blue horizontal line) ends at 80%, and ends at 100% in the bottom bullet chart. The top bullet chart has a performance measure that does not travel as far as the bottom bullet chart, because it is a lower percentage; this ‘left to right’ aspect makes it easier to read, and the linear colours help to attach a ‘quality’ label to the data i.e. poor, medium, good.

Even if only an ‘at a glance’ measure is needed, the colour helps users to add a qualitative label to the values. Since the bullet charts are located one on top of the other, this leads to easy comparison; in the above example, the top chart didn’t exceed the target as much as the bottom chart, and the bottom chart succeeded in obtaining 100% as the performance measure.

To summarise, I hope that this helps to make the value of the linear scale more clear to users; it isn’t obvious to everyone, and not everyone picks it up intuitively. However, when it is explained, I think that there is value in emphasising the comparative aspects of the bullet graph. The colour and ‘left to right’ aspects of the bullet graph can assist viewers to make qualitative understandings of the data that they see before them, and the neatness of the bullet chart location can help to facilitate comparisons between bullet charts.

I hope I’ve got you wondering how you can create bullet charts? More on this to follow!

Visualisation of Nobel Peace Prize Winners

Every year since 1901, the Nobel Prize is an international award dedicated to outstanding achievements to humanity. There are prizes in physics, chemistry, physiology or medicine, literature. It is administered by the Nobel Foundation in Stockholm, Sweden.

The most commonly recognised award is the award for peace, and this year, the Nobel Peace Prize winner has been announced as Chinese dissident Liu Xiaobo. In 1968, Sveriges Riksbank established The Sveriges Riksbank Prize in Economic Sciences in Memory of Alfred Nobel, founder of the Nobel Prize. Personally, I love this award since it allows us, as humans, to stand and take stock of what we do to one another, and gives us a chance to put peace in the frame as a discussion point for the world.

The ever-brilliant Guardian Data Store has released data for every peace prize winner. I have taken this data, and visualised it to show the number of prize winners per country.  Here is the image, or click here for a larger image:

Tableau Nobel Prize Winners

To do this, I used Tableau. This tool is excellent because it recognises country names. However, when visualising data, it is important to be careful if you are just relying on country names alone. For the Nobel Peace Prize winners list, this proved problematic because country names can change over time, and Tableau seemed to recognise the most recent country list. For example, Baroness Bertha Sophie Felicita Von Suttner won the award in 1905, and the country was then known as Austria-Hungary; now, it is known as the Czech Republic.

Also, countries can be known by more than one name, which can be difficult for software to negotiate.  For example, Burma is officially known as the Union of Myanmar, and East Timor is officially known as Timor-Leste.In this case, Tableau didn’t recognise ‘Great Britain’ but did recognise ‘United Kingdom’. So, you have to be very careful of your data; the software can only do so much for you, and you need to double check. Fortunately, Tableau has a ‘View Data’ option which allows users to double-check. A simple tip in Reporting Services is to add in a table which shows your base data, and then remove it before you complete the report. This will allow you to double-check that the report is really showing that which you would like it to show.

Additionally, country names can be deeply contentious; just look at the Middle East for some examples there. No software in the world is going to be able to resolve this; you have to be clear about whether you are using one country name or another. For this visualisation, I replaced ‘old’ names with the most up-to-date country names, and replaced the informal names given in the original data  (e.g. Burma) with the official name (Union of Myanmar).

To summarise, if you are using geographically-based display if data, be careful f you are relying on country names.  Be sure to check whether the country name is picked up properly. Tableau and Reporting Services can both use Longitude and Latitude, and Reporting Services 2008 R2 can use SQL Server 2008 Geography and Geometry data types for absolute certainty. More on this at a later point!

Add to Technorati Favorites

Why don’t people like bullet charts? Part 1

From speaking with users and report writers,

 I have found that there are a number of aspects of bullet charts which people might find a bit off-putting. On the other hand, there are users and report writers who love bullet charts and ‘ra ra’ them as far as possible, using them to replace pie charts and gauges.

This blog is the first in a series, which aims to give airtime to

the reasons that I’ve heard given by business users and report writers for disliking bullet charts. I will blog about one reason at a time, since I have heard a few different perspectives and it is always worth exploring different points of view. My own opinion is that

bullet charts are a good way of displaying data that allows you to make the most out of the ‘real estate’ on your dashboard or report. This was the intention of Stephen Few when he devised the bullet chart, and I have found them useful in a number of projects for different customers. 

However, when I’ve been working with business users and report writers, I have found that, for some people, it can be a struggle to understand bullets. For them, it is the easier option to stay in the ‘comfort zone’, sticking to more familiar charts such as pies. 

I think it is worthwhile exploring the reasons that people give for disliking bullet charts. I would like to address these reasons individually as part of a series, so I can try to find ways to help users to make the transition from not understanding bullet charts, to making an informed decision regarding whether bullet charts are right for them in particular circumstances.

Here, I would like to talk a bit about just one of these issues, which is the role of colour in bullet charts. There are other reasons, and these will be covered in subsequent blogs. 

One reason that people can get put off bullet charts is, I think, simply the colours of the samples that you find on the internet. For example, I have had feedback from some report business users that the colour of the sample bullet charts is a bit heavy on the eyes, and they do not like it.  Here is an example from wikipedia:


In our modern world with colour printers, users often expect ‘nice’ colours rather than black and grey. I completely understand that Few is trying to make the bullet chart as simple as possible for the human visual system to assimilate, and to also take into account the impact of colour-blindness. However, from speaking with people who see bullet charts for the first time, I think that the heavy black line is not visually pleasing.

In Reporting Services 2008 R2, a bullet chart control is available, which is highly customisable. This means that you are not restricted to using heavy black and grayscale colour schemes. I’ll show you to do this in a future blog, but in the meantime, here is another example of a bullet chart, which is the basic bullet chart provided in SSRS 2008 R2, with no customisations or data:


The lighter grey effect is less harsh than a lot of the bullet chart samples that you can find on the internet. Here is another sample bullet chart. It does not have any data since I don’t want to distract you from the main theme of the impact of colour on producing visually-pleasing bullet charts. Here, we use different intensities of blues this time:


This sample was produced using SSRS 2008 R2, and the blue colours were chosen using Color Brewer, which is aimed at helping you to choose the most appropriate colours for your particular visualisation. It’s produced by Cynthia Brewer at Penn State University, and it’s been such a great find that I had to pass on the tip!

There will be more on bullet charts in the future, so watch this space! To summarise, I think that the sample bullet charts can be a bit off-putting because they make use of black and grey. In SSRS 2008 R2, however, it’s easy to change the colour, which has quite a big impact on producing visually-pleasing bullet charts in a way that is not at the expense of user understanding.

Add to Technorati Favorites