Let’s take an example, created in SSRS, which shows a line graph with 500+ rows. In case you are interested, this line graph shows the results of an analysis of the breakdown of samples of olive oil, the thesis from the team at Iowa State University being that olive oil differs in its constituents depending on the geographical part of Italy where the olives come from.
We can see here, from this sample of nearly 5000 rows, that the olives from the South Apulia area are higher in oleic acid; the rows are distinguished by their colour e.g. ‘South – South Apulia’ rows are marked by brown. SSRS was just made for dealing with large data sets, and turning them into visualisations.
Excel 2010, in its plainest “non-PowerPivot” form, struggled to show the 500+ different lines on the line graph. We know that PowerPivot is able to deal with millions of rows of data, but it wasn’t used here; instead, I stayed with the scenario of being a user, who only has Excel installed.
Thus, it was more difficult to get Excel to display hundreds and hundreds of lines of data as in a ‘stress test’ scenario.
The graph looks different since it is simply a subset of the total 500+ rows of data in the original sample.
We still get an idea of the overall pattern, but the detailed, clean look of the SSRS report is better for displaying the whole story, in addition to displaying all of the data. The finer control, afforded by SSRS, was a powerful tool which helped to produce a clean, visual report which displayed all of the data from a larger dataset.
But that’s ok: SSRS and PowerPivot, mobilised by SQL Server and Sharepoint, are there for the purposes of helping users to mobilise large datasets, and help users to make sense of their data. Excel, on its own, offers great accessibility; and who doesn’t love Excel?
One of the key issues in Business Intelligence is the display of data concisely, appropriately and correctly. In particular, Reporting Services (SSRS) and Excel, in the wrong hands, can lead to incorrect data visualisations since it’s very easy to produce charts, graphs and so on, without thinking about the data in question, and the business question that you’re trying to answer. Both SSRS and Excel is that they allow users a generous amount of control over the report itself. In particular, SSRS provides so many controls and options in the ‘Properties’ window of each element of the report that it might be overwhelming for novice users. However, Excel’s ‘ease-of-use’ and SSRS ‘fine-grained control over reports’ can lead to the production of visual displays which are not suited to the data in question, or in some cases, just wrong! Thus, reporting does not always constitute a part of business intelligence; if it does not help to move the business forward, then it can’t be said to be real business intelligence.
Excel, particularly Excel 2010, has some great new data visualisations on board, and it looks as if more exciting data visualisation .NET stuff is coming out of Microsoft research, so watch that space!
In the meantime, if you are using Tableau, some neat visualisations exist for you to use. In case you didn’t know, it can be possible to maximise your analysis of Excel data structures by preparing it using the Tableau Shaper add-in. The add-in prepares the data for optimal Tableau analysis by placing the values into one column, thus ‘unpivoting’ the data. Excel, particularly Excel 2010, is very good at analysing data in row orientation. If you’ve tried to use the sparklines in Excel 2010, you will see how easy it is. On the other hand, Tableau can need a little help in viewing data in ‘row’ orientation, and this is served by the Tableau Excel add-in.
To help you, here is a dashboard which shows you the preferred fiction reading habits of Americans. The data is taken from Harris International, who, in addition to providing fascinating data for American citizens, also serve up UK data for the popular UK Channel 4 series 8 out of 10 Cats.
In the case of the bottom dashboard, Tableau sees the row data as different values, even though they both add up to percentages. Here is an example of the pivoted data:
I hope that helps you to visualise how the underlying data structures are important in determining the ultimate visual output. If your data is not clean, you need to go back and clean it, since it will become evident very quickly!
I look forward to your comments.
This is the second blog in a series which focuses on the occasional difficulties around user acceptance of bullet charts. In my experience, bullet charts are a bit ‘marmite’ – some people ‘get’ them completely, but other people struggle and do not ‘get’ them. In my own opinion, I am a bullet chart fan, since I think that they are a good way of expressing data concisely in a way that’s compatible with our current understanding of human perception.
In order to try and help people make the mental leap towards understanding and using bullet charts, each blog will focus on a particular reason that I’ve heard from users who are not comfortable with bullet charts. I hope to balance each blog by attempting to provide a resolution to try and help users to understand bullet charts a little bit better.
In this blog, I would like to address one particular reason that people have said which means that they don’t ‘get’ bullet charts – they don’t always grasp the meaning of the linear scale used as a background to the bullet graph. In the Bullet Graph Design Spec, the linear scale is described as ‘From two to five ranges along the quantitative scale to declare the featured measure’s qualitative state (optional)’ (Few, 2010). What does this actually mean? Here’s an analogy, which might help to clarify what this means.
Let’s take the case where you would like to check the quality of your password strength. In other words, you would like to check how weak, medium or strong is your password. Microsoft provide a free online password checker, which provides a visual guide of the strength of your password. Here is an example image which denotes the quality of a weak password:
Here is the feedback image provided when the quality of the password is considered to be fair:
Finally, here is the visual feedback provided if the password is judged to be strong:
The farther right along the coloured bar travels, then the password is considered to be higher quality as it proceeds from the left to the right. This is familiar, since this is also used for progress bars during software installs, for example.
The Microsoft free online password checker also double-encodes the quality of the password by using traffic light colouring i.e. red denotes danger, amber denotes fair, and green is ‘go’ or acceptable. As an example, if we wanted to ‘label’ the aforementioned password strength example with percentages, the values 0%-40% may denote a ‘red’ or weak password; 41 – 69% may indicate an ‘amber’ or ‘fair’ password, and values over 70% – 80% may specify a ‘strong’ password; values 81% or over may indicate the ‘best strength’ password.
What has this got to do with bullet charts, I hear you ask? Well, the linear scale at the background of the bullet chart also helps to denote quality in two ways:
– The ‘left to right’ mechanism, familiar from progress bars or the above example, is also deployed in bullet charts. The qualitative state of the measured metric is considered to increase as the measure travels from the left to the right.
– The colour of the bullet chart helps to denote quality by helping to ‘label’ the quantity. As in the aforementioned example where values can be represented by a ‘traffic light’ colour, the bullet chart can also use colour to indicate value, thereby enhancing the ‘qualitative’ understanding of the chart. Here is an example bullet chart from Reporting Services 2008 R2, which can help to illustrate this point:
We can see the values in the darker grey colour are used to denote metrics that fall into the range 0 – 60%, the mid-grey is used to identify 61% – 80%, and the lightest grey is used to denote 81% and over. Stephen Few makes the point that colour-blind people may find the ‘traffic light’ colours confusing. To get around this issue, which affects approximately 10% of males, Few suggests that ranges of intensities of the same hue are easier for people to visually digest, with the darker colour representing ‘poor’ quality, progressing through to lighter intensity to represent ‘good’ quality.
In the last example, I’ve used darker blue to denote 0% – 60%, and lighter blue from 61 – 80%, and finally the lightest blue from 81% until 100%. The colour helps the viewer to identify the quality of the result e.g. poor, medium or high.
The comparative measure (the little vertical blue line) indicates the overall ‘success’ and could be called the ‘target’. In a real life example, this may indicate whether a particular department met a recycling target, or made a certain percentage of total projected sales, for example.
We can see that the top bullet chart slightly exceeded the comparative measure, or target, which can be found at 75% or thereabouts. We can see the performance measure of the top bullet chart (the thick blue horizontal line) ends at 80%, and ends at 100% in the bottom bullet chart. The top bullet chart has a performance measure that does not travel as far as the bottom bullet chart, because it is a lower percentage; this ‘left to right’ aspect makes it easier to read, and the linear colours help to attach a ‘quality’ label to the data i.e. poor, medium, good.
Even if only an ‘at a glance’ measure is needed, the colour helps users to add a qualitative label to the values. Since the bullet charts are located one on top of the other, this leads to easy comparison; in the above example, the top chart didn’t exceed the target as much as the bottom chart, and the bottom chart succeeded in obtaining 100% as the performance measure.
To summarise, I hope that this helps to make the value of the linear scale more clear to users; it isn’t obvious to everyone, and not everyone picks it up intuitively. However, when it is explained, I think that there is value in emphasising the comparative aspects of the bullet graph. The colour and ‘left to right’ aspects of the bullet graph can assist viewers to make qualitative understandings of the data that they see before them, and the neatness of the bullet chart location can help to facilitate comparisons between bullet charts.
I hope I’ve got you wondering how you can create bullet charts? More on this to follow!
Every year since 1901, the Nobel Prize is an international award dedicated to outstanding achievements to humanity. There are prizes in physics, chemistry, physiology or medicine, literature. It is administered by the Nobel Foundation in Stockholm, Sweden.
The ever-brilliant Guardian Data Store has released data for every peace prize winner. I have taken this data, and visualised it to show the number of prize winners per country. Here is the image, or click here for a larger image:
To do this, I used Tableau. This tool is excellent because it recognises country names. However, when visualising data, it is important to be careful if you are just relying on country names alone. For the Nobel Peace Prize winners list, this proved problematic because country names can change over time, and Tableau seemed to recognise the most recent country list. For example, Baroness Bertha Sophie Felicita Von Suttner won the award in 1905, and the country was then known as Austria-Hungary; now, it is known as the Czech Republic.
Also, countries can be known by more than one name, which can be difficult for software to negotiate. For example, Burma is officially known as the Union of Myanmar, and East Timor is officially known as Timor-Leste.In this case, Tableau didn’t recognise ‘Great Britain’ but did recognise ‘United Kingdom’. So, you have to be very careful of your data; the software can only do so much for you, and you need to double check. Fortunately, Tableau has a ‘View Data’ option which allows users to double-check. A simple tip in Reporting Services is to add in a table which shows your base data, and then remove it before you complete the report. This will allow you to double-check that the report is really showing that which you would like it to show.
Additionally, country names can be deeply contentious; just look at the Middle East for some examples there. No software in the world is going to be able to resolve this; you have to be clear about whether you are using one country name or another. For this visualisation, I replaced ‘old’ names with the most up-to-date country names, and replaced the informal names given in the original data (e.g. Burma) with the official name (Union of Myanmar).
To summarise, if you are using geographically-based display if data, be careful f you are relying on country names. Be sure to check whether the country name is picked up properly. Tableau and Reporting Services can both use Longitude and Latitude, and Reporting Services 2008 R2 can use SQL Server 2008 Geography and Geometry data types for absolute certainty. More on this at a later point!