Data Visualisation and SQL Server Blethering in July

In the Scots language, a ‘blether’ is someone who likes the sound of their own voice. I will be speaking at four events in July, so I’m sure I qualify as a blether! In case anyone’s interested, here are the details of the events:

I will be speaking at the Tableau UK User Group on 5th July. Stephen Few is presenting, and he’s the best presenter I have ever seen (and I’ve attended lots of presentations). I’ve volunteered to go straight after Stephen which is brave since he’ll have set the standard for presentations extremely high, and I won’t compare! To Register, visit our Tableau User Group site.

I am speaking at a SQLPass Performance Webinar on the visualisation of the data contained in the Microsoft Management Data Warehouse using SQL Server Reporting Services 2008 R2. This takes place on July 5th, after the Tableau event, and the registration details can be found here. This event sponsored by Confio

I will be speaking on Data Visualisation and SQL Server Reporting Services at the South Wales SQL Server UG July 14th Cardiff. To get more details, and to register, please visit the SQL Server User Group site.

On 21st July, I am also speaking at the Manchester SQL Server User Group but the details aren’t up on the site yet. The topic will be the data visualisation of the data in the Management Data Warehouse again, and the link to the site is here

Wish me luck!
Jen x

Ledface Crowd Funding Initiative

I recently blogged about Collective Intelligence and its application to the SQL Server Community in particular. I mentioned one initiative by Ledface, who are trying to build an online community that will leverage cultural intelligence to answer questions from individuals. Ledface has a very specific goal: to enable people to tap into collective intelligence to answer everyday queries and problems. The Ledface team regard it as new kind of social network in which people interact with each other indirectly, sharing knowledge through Ledface. 
The Ledface team are relying on crowdfunding to get their efforts off the ground. I’ve put the link below. Please note, however, that I’m not officially associated with Ledface in any way, and if you do decide to crowdfund them, then that’s between you and the Ledface team.

Ledface interests me because its emphasis is on social networks to answer questions, rather than a straightforward Google search. It would be good if it is successful, and I wish the Ledface team the best of luck!

http://www.indiegogo.com/project/widget/30873

Photo of Stephen Few
The UK Tableau User Group committee are delighted to announce the Keynote speaker for our next event is Stephen Few. If you would like to register for this event, please do visit the registration site here

Stephen writes the quarterly Visual Business Intelligence Newsletter, speaks and teaches internationally, and provides design consulting. In 2004 he wrote the first comprehensive and practical guide to business graphics entitled Show Me the Numbers, in 2006 he wrote the first and only guide to the visual design of dashboards, entitled Information Dashboard Design, and in 2009 he wrote the first introduction for non-statisticians to visual data analysis, entitled Now You See It. Personally, I think that both of these books are fantastic, and I always recommend them at SQL Server User Groups and to my customers.

Please note that only a maximum of two (2) delegates will be admitted per organisation. Should any organisation wish for further delegates to attend an email can be sent to uktug@uktug.com – however, I suspect that there will be no places left but you can try your luck! Any places allocated from the waiting list will be confirmed just prior to the event.

Full details of the Agenda will be published on http://www.uktug.com and the LinkedIn UK Tableau Users Group. 

How can reading a report be like reading a comic? Trellis Charts

How can reading a report be like reading a comic? Comics have a particular accepted structure, where each panel tells its own story, but the whole number of panels together tell a complete story. In a comic, the individual panels combine together to form a narrative. If you are interested in reading more about the structure of comics, then I recommend that you read ‘Understanding Comics: The Invisible Art’ by Scott McCloud. I first saw the idea of relating reporting to comics in the brilliant Dataspora blog, and I was interested to relate this concept to SQL Server Reporting Services. The purpose of this blog will show you how easy it is to create a Trellis report in reporting services, followed by some data visualisation principles and some thoughts on interpreting the data, and finally some references for further reading. 

I have been presenting this material at some of the SQL Server User Groups in the UK, and thought it might be useful to give the information out in blog format too. 

In the sphere of reporting, it is also possible to construct a panel of reports in order to form a narrative of what the data is telling the data consumer. The term for a panel of reports is a trellis chart (Cleveland, 1997) is a ‘chart of charts’ – it is a multi-panel of small, similar charts. The design of a trellis is hallmarked by the arrangement of the small charts in a lattice or grid fashion, whereby the charts are arranged into rows, columns or even pages.

Each chart in the trellis is similar; for example, all the charts in the trellis should be the same style, for example, scatterplot, bar charts and so on. However, each chart differs in terms of the category variable that is displayed. This means that the trellis structure can help the data consumer to understand the interactions between the variables that produce the result.

Trellis charts are useful for displaying data that is multivariate and/or has large data sets.  If a chart has a lot of bars and lines, then it can become difficult for the user to understand. By splitting the data out by category, Trellis charts can help the data consumer to understand the structure of the data by removing the need for 3D representation, or by having too many bars and lines.  Here is an example of a trellis chart. If you can’t see it properly, then please click here to go to flickr:

CompletedTrellisExample



The above trellis chart shows the BP Statistical Review of World Energy 2010, and shows the expected resources for oil production in Thousand Million gallons per day. I have provided a cut down version of the CSV file here, for your reference. 

This blog will explain how to create the above trellis chart using SQL Server Reporting Services 2008 R2. The reason we use this version of SQL Server is because it has the Matrix component, as we will see as we progress through this blog.

First things first: a data source! We will need a table that is populated with data, and a stored procedure to extract the data. I have given you samples below.

1.      Create and Populate the Table
If you need a script to create a table, here is one for you:

CREATE TABLE [dbo].[WorldRegionBPProvedReserves](
       [Year] [float] NULL,
       [Region] [nvarchar](255) NULL,
       [Country] [nvarchar](255) NULL,
       [Thousand Million Barrels] [float] NULL
) ON [PRIMARY]

You can import the CSV file into the table using the SSIS wizard in SSMS, just to make life easier for you.

2.      Extract the data using a Stored Procedure

Once the data has been imported, you’ll need a stored procedure to extract the data from the database. Here is a sample stored procedure for you:

/****** Object:  StoredProcedure [dbo].[usp_MV_WorldRegionBPProvedReserves]    ******/
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
CREATE PROCEDURE [dbo].[usp_MV_WorldRegionBPProvedReserves]
–exec [dbo].[usp_MV_WorldRegionBPProvedReserves]
AS
SELECT [Year]
      ,[Region]
      ,[Country]
      ,CAST([Thousand Million Barrels] AS INT) AS [Thousand Million Barrels]
  FROM [SQLBits].[dbo].[WorldRegionBPProvedReserves]
 WHERE
 Country IN
 (
‘Canada’,‘Iran’,‘Kazakhstan’,‘Kuwait’,‘Mexico’,‘Other Europe & Eurasia’,‘Russia’,‘Saudi Arabia’,‘United States of America’
 )
 
GO

Once you’ve created the stored procedure, you should test that it works so far by executing it:

EXEC [dbo].[usp_MV_WorldRegionBPProvedReserves]

Et Voila! You should see the data. If you don’t see the data, then you will need to resolve this issue before you try to move forward.

3.      Create a new Project, Data Source and Dataset

Once you are confident that the stored procedure is returning data, it is time to create a new project, with a corresponding data source and a dataset. If you need information on how to do this, then please refer to the Microsoft tutorials for clear explanations on how to create a new project, data sources and datasets.
Once these have been created, add a new report to the project, and give it an appropriate name.

4.      Using the Matrix Component to display and group

Use the Matrix component; simply drag it from the toolbox, and pop it onto the canvas. I like to name my components, so I am going to call this one Matrix_WorldRegionBPProvedReserves.
Take the chart component, and drag it to the matrix cell. Select a 2D Bar Chart.
Then, the bar chart will need to be populated with data. We will display the data by region and by year.

The grouping section of the SSRS report should appear as follows:

1. Grouping of Rows and Columns



In other words, the rows should be grouped by years, and the columns should be grouped by columns.

5.      Populating each chart with data
The next step is to populate each chart with data. For the values, we want to display the number of barrels measured in thousand million units. For the category grouping, we want to select ‘Country’. Here is a sample picture:

2. Populating the chart with data


6.      Data Visualisation – making the trellis consistent

Remove the legend from each chart – it gives the data more room to ‘breathe’, and is unnecessary repetition. If it helps, make it clear in the title, or create a new legend outside of the individual charts. Removing the unnecessary repetition means that the data/ink ratio is maximised, since unnecessary redundant information is removed.

Remove the chart title from each chart – again, this is unnecessary repetition. The chart title takes up space, and it can be easily placed outside the trellis structure.

Remove the axis title – again, this is repeat information. In this case, it is obvious that the axis is a country, and there is no need to add it in.

The Axis scale needs to be made consistent – if not, it will confuse the users. If this is done automatically, it will vary from chart to chart. However, it is important that the axis sizes all match correctly, or the user will get incorrect information. To adjust the axis size, right click on the axis and set the maximum value to be the same for all charts.

7.      Interpreting the Report

One of my favourite quotes from ‘Now you See it’ by Stephen Few is that ‘comparison is the beating heart of analysis’. This is one of my mantras for data visualisation. It’s clear from the trellis example that, if the charts were all different sizes, it would be very difficult to compare. However, by ensuring that the X-axis of the individual charts all match, then it is comparison is facilitated. Put another way, if the X-axis for each chart does not match, then the data navigator may misinterpret the data.

When we preview the report, it should appear as follows. If you can’t see it properly, then please click here to go to flickr:

CompletedTrellisExample



The data is interesting. If you look at the data for Mexico, for example, then you can see that the expected resources have decreased from 1989 to 2009. Since the trellis chart allows us to compare up and down easily, we can see changes over time. We can also see a similar trend for ‘Other Europe and Eurasia’, which is a combination of the resources for some smaller European countries. 

We can also compare ‘across’ the table as well as ‘down’ the table. For example, Russia, Iran and Canada have increased their expected resources from 1989 to 2009.

This chart could be further improved by adding in colour to show the maximum value, for example, or by adding in more countries to look for further patterns. To summarise, the trellis chart can be a useful narrative for displaying data that is multidimensional in nature, and means that 3D is not necessary. By using the analogy of reading a comic, it is hoped that the trellis chart can be more easily understood and introduced to people who may not be familiar with the concept.

References
Trellis Display from Bell LabsVisualizing Data by William S. Cleveland (1993)Envisioning Information by Edward R. Tufte (1990). This provides a clear explanation of the ‘Small Multiples’ concept, as well as the ‘data ink’ ratio.

Tableau and PowerPivot Presentation

As some of you may know, I recently presented at the Tableau European User Conference which was held in Amsterdam in May 2011. The topic was ‘Using Tableau with PowerPivot’, in which I explored the idea of PowerPivot as a data source and Tableau as the presentation layer. I wrote a blog on the topic of Tableau and PowerPivot recently, and the purpose of this post is to simply share the slides.

I thought that the slides might be useful, so I have posted them for you here.

I hope that you enjoy, and please do let me know your comments.

SQL Denali Connect Item Dashboard

Dashboards are for everyone, but it is important to provide the correct type of dashboard for the various sets of people who are consuming the data. The dashboards are both designed differently, and function differently. There are three main types of dashboard:
Dashboard Type
Purpose
Audience
Components
Operational
Monitor
Operational
Dashboard
Scorecard
Faceted Analytical
Tracking Analysis
Analysts
Time-Series
Reporting
Data Mining
Multi-dimensional
Strategic
Measure Execution, Forecasting
Senior Management
Strategy maps
Initiative Management
There is a definite difference between screens that monitor the state of things and screens that are used for data analysis.  In an article imaginatively entitled Three Blind Men and an Elephant, Stephen Few discusses the concept of a Faceted Analytical Display (FAD), where the data navigator actively consumes the data. One of my favourite quotes from Stephen Few’s Now You See It book is ‘Comparison is the beating heart of analysis’. Translated into the concept of a Faceted Analytical Display, the dashboard has multiple concurrent views of a common data set so that comparisons can be made more easily. 
Normally, when we are creating a dashboard, we need to understand two key items:
        What does the data consumer need to know?
        What does the data consumer need to act upon?
Normally, a dashboard should be underpinned by key strategies and goals, in order to align the organization towards a common series of goals. The dashboard then displays the measurement of the performance of the organization towards the goals.
What matters is that you have the means to expand your analytical reach by viewing the same data in different ways.  In a Faceted Analytical Display, the same dataset is common to all charts, so the viewer is looking at the same dataset in different ways. In order to clarify what a Faceted Analytical Display actually is, this dashboard example presents information from the Microsoft Connect database. If you click on the image, it will take you through to Tableau Public, where the dashboard is hosted:

Connect Items Dashboard

In case you are not familiar, the Microsoft Connect program provides a forum for users to provide feedback to Microsoft in order to improve the quality of Microsoft products. By giving users a say, it allows users to impact the direction of Microsoft products. At the time of writing, Microsoft products are currently accepting bugs in 65 different products, and accepting suggestions in 50 products. The range includes a diversity of Products including Microsoft Office, Bing Map App SDK, and Project Trident. Before we proceed, I’d like to thank the following people for their help in obtaining the data:  Aaron Nelson (Twitter), Nic Cain (Twitter) and Rob Farley (Twitter).
The purpose of the dashboard was to allow people to examine some of the Connect information in more detail, and to expose any patterns in the data.  The dashboard is constituted of four different parts, and a sample can be seen below:
To summarise, the four dashboards cover four different areas:
Number of Denali Connect Items – this answers the question ‘How many Denali Connect Items were opened by year?’
Breakdown of Denali Connect Item Types by Product – this answers the question ‘How many Items could be classified by Product?’
Closed Connect Calls – This answers the question ‘Why were the Connect Calls Closed?’
Resolved Connect Calls – This answers the question ‘Why were the Connect Calls Resolved?’

Each chart in the dashboard is based on the same dataset. Overall, the dashboard was designed in order to facilitate interaction.
It is possible to analyse the data in terms of the item status dimension i.e. active, closed or resolved items, or grouped together by the ‘All’ status. This is done by selecting the appropriate item from the top right hand side box called ‘Split by Item Status’.
The top two graphs are line charts; normally we might expect this categorical data to be a column chart. However, I felt that the identification of patterns using the additional radio button feature might be enhanced by using a line chart since the simplicity might be more revealing. I will be interested to see what the user feedback is!
I have added an overall title for the dashboard, as well as adding in individual titles and captions for each of the four charts. The additional text has been added in order to clarify the ‘business question’ for each chart. Whilst text is not usually necessary, I felt that this dashboard would be consumed by people whom I would never be likely to meet, who wouldn’t be familiar with the data source, and to whom I wouldn’t be able to explain anything verbally. Therefore, I felt that the additional title and caption would be extremely useful.
The dashboard is made up of four individual graphs, which are discussed below:
Number of Denali Connect Items
This chart aims to show a trend of the number of SQL Server Denali connect items that have been opened, over time. An example of the chart is given below:

Number of Connect Items Count

It is clear that the number of ‘Active’ Calls is much higher than the Closed and Resolved calls. Thus, the actual data point shows the number of new calls. The current status of those Connect items is denoted by the colour of the line. So, for example, so far in 2011, there are 566 calls opened which are currently noted as ‘Active’.
Breakdown of Denali Connect Item Types by Product
This chart shows a very rough classification of the Denali Connect items into product areas e.g. SSRS, SSAS and so on.

Breakdown by Product

In terms of classifying the Denali Connect Items, a simple keyword search serves to reveal patterns of organising the Connect Items into subject area e.g. SSRS, SSAS etc. The dashboard shows that SSRS is the most ‘popular’ area for recording new Connect items. As someone with a background in Artificial Intelligence who has delivered projects in natural language processing, very simple keyword searches are considered to be very rudimentary.  Due to time constraints, unfortunately I didn’t have time to create anything more complex than a simple analysis.
The main point is that the chart shows that the SSRS vocabulary is the most prevalent in the Connect items discussion. This is closely followed by SSMS.
Normally you might expect to see a bar chart or similar for this type of data, because it is categorical. I have used a line chart because it emphasised the patterns in the data. For example, SSMS is shown to have a high number of Active connect items, with the higher number of Closed and Resolved status items. I will be interested in the comments!
Closed and Resolved Status Items
The ‘Closed Connect Calls’ chart shows the breakdown of the Closed Connect calls where it’s possible to obtain some additional information about the closure itself. For example, if a Connect item is closed as ‘Non Reproducible’, then this is added here. Connect Items are not included here if there is no closure statement.

Closed Connect Items

The ‘Resolved Connect Calls’ chart is very similar to the ‘Closed Connect Calls’ chart, but shows the reason for the Connect items being given the ‘Resolved’ status. Here is an example:

Resolved Connect Items

The colour scheme remains consistent throughout e.g. if something is ‘Fixed’ for closed, then the same colour is ‘Fixed’ for Resolved too.
Further work will involve keeping the dashboard up-to-date. Unfortunately I don’t have a ‘closed’ date so I can’t work out the duration of the Connect item with certainty. Due to this, I have left this part of the analysis. I would have liked to add this in, and perhaps we will have access to the ‘Closed’ date in the future.

To summarise, the creation of a dashboard needs to start at a point where the purpose and function of the dashboard are identified so that the correct type of dashboard can be utilised. Its design should be in line with the cognitive and perceptual evidence on data visualisation. Finally, it’s good to expose data to allow users to analyse and work with their data. 

Tableau Blogger Competition

Tableau have set up a data visualisation competition. Jerome Cukier posted an entry, and Andrew Cogreave has commented here

I echo Andy’s comments – I think that Jerome has done a great job. Andy has already commented so I will refer you to his site for details. I would only add a few things:

– I would change the title background for each title from grey to White. This reduces the amount of ink used on the page, since the grey title background doesn’t add anything to data story.
– I might add a ‘business question’ to each chart. The ‘business’ question is a phrase that points out the meaning of the chart. This can help people to understand the purpose of the chart.
– overall, I would add a business question to the dashboard. This focusses the design of the dashboard, to formulate well.

To see an example of these points, take a look at my Microsoft Denali Connect dashboard for an example of these points.

I think that Jerome has done a great job!

Good luck to all tableau Competition entrants.