Project Crescent: When is it best applied?

From what I’ve read and understood, Project Crescent is best applied to ad-hoc, undirected analysis work rather than directed reports, displaying current operational data activity. This blog aims to understand more about Project Crescent, and when it is best applied to business questions.

From the front-end reporting perspective, Project Crescent is a major new Business Intelligence addition to Microsoft SQL Server Denali. It is primarily designed to assist business users, across an organisation, to ‘storyboard’ their data by facilitating unstructured analyses of the data, and to share it in the Microsoft Office applications that we are familiar with already. Project Crescent has its genesis in the SSRS team, and is focused on allowing business users to analyse, mine and manipulate their data ‘on the fly’. The solution will be browser-based, using Vertipaq and Silverlight to product rapid BI results by allowing users to interact with their data. Further, this analysis will be powered by the Business Intelligence Semantic Model (BISM), and will only be available in Sharepoint mode. This criteria would need to be met in order to use Project Crescent, so this may not work in every environment.


How does this perspective differ from reporting? Well, standard reporting takes business users down a particular structured route; for example, the parameters are pre-prepared in the report in drop-down format, and the dataset is already ‘set’ about the data that can be shown. This offers limited options for business users to interact with the data. On the other hand, Project Crescent will allow unstructured analysis of the data, which does not lead business users down a particular route. Rather than produce reports, the unstructured analysis does not offer a real separation between ‘design’ and ‘preview’. Instead, the users ‘play’ with their data as they go along, ‘thinking’ as fast as they ‘see’.

‘Unstructured’ type of data analysis is great with most data sources. For straightforward operational needs, the structured reporting facillities will answer the immediate, operational business question.  This is particularly the case when considering the display of data from an Operational Data Store (ODS). What does the ODS give you? The ODS basically helps support people who conduct business at the operational level, who need the information to react quickly. Essentially, it sits between the source systems and the data warehouse downstream, and the data usually has a short shelf-life. It is an optional aspect of the business intelligence architecture, and worth considering since many organisations have at least one ODS. There are many different definitions for the ODS, but here are some common chacteristics:

– it intakes data from disparate sources, cleanses and integrates them
– it is designed to relieve pressure from source systems
– stores data that isn’t yet loaded into the data warehouse, but is still required now

One example of this ODS deployment is in contact centre management. For example, the contact centre manager will need to know if there is a high abandoned call rate at the contact centre. As a result of this, they may need to temporarily direct calls to another team in order to meet the unusually high demand. To do this, however, they need to be aware that the abandoned call rate has reached a particular threshold. This reporting requirement is operational rather than strategic; a strategic decision may involve using a number of metrics to identify whether some longer-term re-organisation is required, but the ODS helps the business to run optimally here and now. 

This leads us to another question; how can we best display the data in the ODS, and display it? Data visualisation of ODS information has very specific requirements since it needs to show, at a glance, how the business is operating.  There are a number of different data visualisations which can assist in showing ‘at-a-glance’ information very effectively. There are plenty examples of data visualisations that can assist in the ‘here and now’ business question facilitation. For example, for information on bullet charts to display target information, please see an earlier blog. It is also straightforward to produce KPIs in Reporting Services, and if you’re interested, please see this post by Jessica Moss. and so on. In SQL Server Denali, these visualisations can be produced using SSRS, Sharepoint, Excel, and PowerPivot, not to mention a whole range of other third-party applications such as Tableau. There are more details about the Microsoft-specific technologies in the SQL Server Denali roadmap, which can be found here on TK Anand’s Blog

To summarise, from what I’ve read so far, Project Crescent is aimed more at ad-hoc analysis work rather than displaying current operational data activity. From the roadmap, I gather that Microsoft SQL Server Denali will have something to please the business users who require ‘unstructured’ data analyses, in addition to the high standard of functionality already available for those who require structured data analyses in response to a changing business environment. I look forward to its release!

SSRS & Excel; Visualising Large Datasets

What does Microsoft Reporting Services give you, that Excel doesn’t? Excel offers accessibility of data, and is, in my experience, the most popular BI tool that I encounter. On the other hand, Reporting Services is less accessible for people who don’t have strong Excel skills, but offers you a great deal of fine-grained control over the reporting output.

Let’s take an example, created in SSRS, which shows a line graph with 500+ rows. In case you are interested, this line graph shows the results of an analysis of the breakdown of samples of olive oil, the thesis from the team at Iowa State University being that olive oil differs in its constituents depending on the geographical part of Italy where the olives come from. 

SSRSParallelCoOrdinates

We can see here, from this sample of nearly 5000 rows, that the olives from the South Apulia area are higher in oleic acid; the rows are distinguished by their colour e.g. ‘South – South Apulia’ rows are marked by brown. SSRS was just made for dealing with large data sets, and turning them into visualisations.

Here is a subset of the total data, shown in Excel 2010, as follows:

ExcelParallelCoOrdinates

Excel 2010, in its plainest “non-PowerPivot” form, struggled to show the 500+ different lines on the line graph. We know that PowerPivot is able to deal with millions of rows of data, but it wasn’t used here; instead, I stayed with the scenario of being a user, who only has Excel installed.

Thus, it was more difficult to get Excel to display hundreds and hundreds of lines of data as in a ‘stress test’ scenario. 

The graph looks different since it is simply a subset of the total 500+ rows of data in the original sample. 

We still get an idea of the overall pattern, but the detailed, clean look of the SSRS report is better for displaying the whole story, in addition to displaying all of the data. The finer control, afforded by SSRS, was a powerful tool which helped to produce a clean, visual report which displayed all of the data from a larger dataset.

But that’s ok: SSRS and PowerPivot, mobilised by SQL Server and Sharepoint, are there for the purposes of helping users to mobilise large datasets, and help users to make sense of their data. Excel, on its own, offers great accessibility; and who doesn’t love Excel?

Add to Technorati Favorites

SSRS and Excel: Business Intelligence is more than just producing reports!

One of the key issues in Business Intelligence is the display of data concisely, appropriately and correctly. In particular, Reporting Services (SSRS) and Excel, in the wrong hands, can lead to incorrect data visualisations since it’s very easy to produce charts, graphs and so on, without thinking about the data in question, and the business question that you’re trying to answer. Both SSRS and Excel is that they allow users a generous amount of control over the report itself. In particular, SSRS provides so many controls and options in the ‘Properties’ window of each element of the report that it might be overwhelming for novice users. However, Excel’s ‘ease-of-use’ and SSRS ‘fine-grained control over reports’ can lead to the production of visual displays which are not suited to the data in question, or in some cases, just wrong! Thus, reporting does not always constitute a part of business intelligence; if it does not help to move the business forward, then it can’t be said to be real business intelligence.

Here is an example from my presentation at SQLBits 7, which is a Reporting Services chart which shows the top 5 most frequently appearing Doctor Who Villains since the series’ inception in 1963:

DoctorWhoLineGraph

Those of you who have had lots of caffeine will notice that the line graph is not a suitable method of displaying this data. Line Graphs are better for displaying continuous data, not distinct categorical data. If you are a fan of the Doctor Who series, you will know that the Daleks and the Cybermen are separate types of villains. If you are a real Doctor Who obsessive, then you will be wondering whether the count of Ice Warriors excludes the number of episodes where the Ice Warriors appeared in a benevolent capacity; this graph is specifically aimed at displaying a count of appearances rather than an analysis of the degree of ‘villain-ness’.

Since the villains are constituted of distinct categories, then the same data would be better placed as a bar chart in an SSRS report, which you can see below:

DoctorWhoBarChart

This SSRS column chart report is better because it emphasises the distinct categories being measured. Further, the ‘left to right’ reading methodology is easier for us here in the West. Also, it facilitates cognitive integration of the chart itself, by allowing sorting and easy comparison. This isn’t so straightforward in the line graph example given earlier. So, even though line graphs and column charts are very simple displays, it is easy to use them incorrectly.

To summarise, it is very easy to do visualisations which are not being representative of the data sample. Of course, the users have an option to amend the underlying data structure that they are looking for, so that they can obtain the particular visualisation that they would like to see. However, SSRS and Excel, in the wrong hands, can lead to reports that simply do not answer the business question, or help the business users to drive the business forward; that’s the heart of business intelligence.

Add to Technorati Favorites

Using Excel as a data source for Tableau – Shaping your Data

The purpose of this blog is to help you to see the difference that the underlying data structure has in determining the visual resulting visual output. For me, the underlying data is all-important; if it isn’t clean or well-structured, then this needs to be resolved first. After that, even the neat pivoted and un-pivoted data structures can have a real impact on the visual display of the data.

Excel, particularly Excel 2010, has some great new data visualisations on board, and it looks as if more exciting data visualisation .NET stuff is coming out of Microsoft research, so watch that space!

In the meantime, if you are using Tableau, some neat visualisations exist for you to use. In case you didn’t know, it can be possible to maximise your analysis of Excel data structures by preparing it using the Tableau Shaper add-in. The add-in prepares the data for optimal Tableau analysis by placing the values into one column, thus ‘unpivoting’ the data. Excel, particularly Excel 2010, is very good at analysing data in row orientation. If you’ve tried to use the sparklines in Excel 2010, you will see how easy it is. On the other hand, Tableau can need a little help in viewing data in ‘row’ orientation, and this is served by the Tableau Excel add-in.

To help you, here is a dashboard which shows you the preferred fiction reading habits of Americans. The data is taken from Harris International, who, in addition to providing fascinating data for American citizens, also serve up UK data for the popular UK Channel 4 series 8 out of 10 Cats

Here is an example; in this dashboard, the top data has been ‘shaped’ and unpivoted, and the bottom section has remained in the original ‘pivoted’ format. To go to Tableau Public, click on the image to get a bigger and better interactive view: 

US_Fiction_Reading_Preference_2010

Here, the top data has been unpivoted into a single value column, so it is easier for Tableau to analyse and compare. Here is an example of the unpivoted data:

ExcelUnpivoted

In the case of the bottom dashboard, Tableau sees the row data as different values, even though they both add up to percentages. Here is an example of the pivoted data:

ExcelPivot

I hope that helps you to visualise how the underlying data structures are important in determining the ultimate visual output. If your data is not clean, you need to go back and clean it, since it will become evident very quickly!

Bullet Charts in SQL Server Reporting Services 2008 R2: ‘How To’ Video

Recently someone told me that they were struggling to produce a Stephen Few inspired bullet chart using Microsoft SQL Server 2008 R2. This is perfectly understandable, because, for some reason, bullet charts are considered a ‘gauge’ in Reporting Services. This is ironic because, if you’ve read Stephen Few’s material, then you will already be aware that bullet charts were designed to be more meaningful in displaying data than gauges and pie charts.
In order to help individuals to produce bullet charts rather than gauges in SQL Server Reporting Services 2008 R2, I’ve produced a quick video which I hope will help. Before we dive into the video, let’s have a look at the end result:
Bullet Graph Video Result
The purpose of this chart is simply an example, which shows the sales data for David Hume and John Napier. The sales ‘target’ is the little vertical line that you see in both charts. So, for David, the line is set at 75%, and for John, the target is set at 70%.
The ‘actual’ sales is represented by the thick blue horizontal line. So, for David, he sold 80% possible sales. John, on the other hand, sold 100% possible sales.
The bullet chart is nice simply because it allows you to quickly compare target values with actual values, and it also allows you to quickly look down and compare ‘between’ values as well. Here, it is quite easy to compare David’s actual and target sales with John’s actual and target sales.  Bullet charts take up less room and express meaningful information, so we can compress more information into a given space than we would get from a gauge.
Normally, in a bullet chart, you may consider removing the percentages or other values along the quantitative scale. However, in order to try and be as clear as possible, I have left them in.
I hope you enjoy the video, which I’ve placed on YouTube and you may also see here:

I look forward to your comments.

Add to Technorati Favorites

Why don’t people like bullet charts? Part 2

This is the second blog in a series which focuses on the occasional difficulties around user acceptance of bullet charts. In my experience, bullet charts are a bit ‘marmite’ – some people ‘get’ them completely, but other people struggle and do not ‘get’ them. In my own opinion, I am a bullet chart fan, since I think that they are a good way of expressing data concisely in a way that’s compatible with our current understanding of human perception. 

In order to try and help people make the mental leap towards understanding and using bullet charts, each blog will focus on a particular reason that I’ve heard from users who are not comfortable with bullet charts. I hope to balance each blog by attempting to provide a resolution to try and help users to understand bullet charts a little bit better.

In this blog, I would like to address one particular reason that people have said which means that they don’t ‘get’ bullet charts – they don’t always grasp the meaning of the linear scale used as a background to the bullet graph. In the Bullet Graph Design Spec, the linear scale is described as ‘From two to five ranges along the quantitative scale to declare the featured measure’s qualitative state (optional)’ (Few, 2010). What does this actually mean? Here’s an analogy, which might help to clarify what this means.

Let’s take the case where you would like to check the quality of your password strength. In other words, you would like to check how weak, medium or strong is your password. Microsoft provide a free online password checker, which provides a visual guide of the strength of your password. Here is an example image which denotes the quality of a weak password:

Microsoft Password Checker Weak Quality

Here is the feedback image provided when the quality of the password is considered to be fair:

Microsoft Password Checker Amber Quality

Finally, here is the visual feedback provided if the password is judged to be strong:

Microsoft Password Checker Strong Quality

The farther right along the coloured bar travels, then the password is considered to be higher quality as it proceeds from the left to the right. This is familiar, since this is also used for progress bars during software installs, for example.

The Microsoft free online password checker also double-encodes the quality of the password by using traffic light colouring i.e. red denotes danger, amber denotes fair, and green is ‘go’ or acceptable. As an example, if we wanted to ‘label’ the aforementioned password strength example with percentages, the values 0%-40% may denote a ‘red’ or weak password; 41 – 69% may indicate an ‘amber’ or ‘fair’ password, and values over 70% – 80% may specify a ‘strong’ password; values 81% or over may indicate the ‘best strength’ password.

What has this got to do with bullet charts, I hear you ask? Well, the linear scale at the background of the bullet chart also helps to denote quality in two ways:

 – The ‘left to right’ mechanism, familiar from progress bars or the above example, is also deployed in bullet charts. The qualitative state of the measured metric is considered to increase as the measure travels from the left to the right.

 – The colour of the bullet chart helps to denote quality by helping to ‘label’ the quantity. As in the aforementioned example where values can be represented by a ‘traffic light’ colour, the bullet chart can also use colour to indicate value, thereby enhancing the ‘qualitative’ understanding of the chart. Here is an example bullet chart from Reporting Services 2008 R2, which can help to illustrate this point:

Result

We can see the values in the darker grey colour are used to denote metrics that fall into the range 0 – 60%, the mid-grey is used to identify 61% – 80%, and the lightest grey is used to denote 81% and over.  Stephen Few makes the point that colour-blind people may find the ‘traffic light’ colours confusing. To get around this issue, which affects approximately 10% of males, Few suggests that ranges of intensities of the same hue are easier for people to visually digest, with the darker colour representing ‘poor’ quality, progressing through to lighter intensity to represent ‘good’ quality.

In the last example, I’ve used darker blue to denote 0% – 60%, and lighter blue from 61 – 80%, and finally the lightest blue from 81% until 100%. The colour helps the viewer to identify the quality of the result e.g. poor, medium or high.

The comparative measure (the little vertical blue line) indicates the overall ‘success’ and could be called the ‘target’. In a real life example, this may indicate whether a particular department met a recycling target, or made a certain percentage of total projected sales, for example.

We can see that the top bullet chart slightly exceeded the comparative measure, or target, which can be found at 75% or thereabouts. We can see the performance measure of the top bullet chart (the thick blue horizontal line) ends at 80%, and ends at 100% in the bottom bullet chart. The top bullet chart has a performance measure that does not travel as far as the bottom bullet chart, because it is a lower percentage; this ‘left to right’ aspect makes it easier to read, and the linear colours help to attach a ‘quality’ label to the data i.e. poor, medium, good.

Even if only an ‘at a glance’ measure is needed, the colour helps users to add a qualitative label to the values. Since the bullet charts are located one on top of the other, this leads to easy comparison; in the above example, the top chart didn’t exceed the target as much as the bottom chart, and the bottom chart succeeded in obtaining 100% as the performance measure.

To summarise, I hope that this helps to make the value of the linear scale more clear to users; it isn’t obvious to everyone, and not everyone picks it up intuitively. However, when it is explained, I think that there is value in emphasising the comparative aspects of the bullet graph. The colour and ‘left to right’ aspects of the bullet graph can assist viewers to make qualitative understandings of the data that they see before them, and the neatness of the bullet chart location can help to facilitate comparisons between bullet charts.

I hope I’ve got you wondering how you can create bullet charts? More on this to follow!

Visualisation of Nobel Peace Prize Winners

Every year since 1901, the Nobel Prize is an international award dedicated to outstanding achievements to humanity. There are prizes in physics, chemistry, physiology or medicine, literature. It is administered by the Nobel Foundation in Stockholm, Sweden.

The most commonly recognised award is the award for peace, and this year, the Nobel Peace Prize winner has been announced as Chinese dissident Liu Xiaobo. In 1968, Sveriges Riksbank established The Sveriges Riksbank Prize in Economic Sciences in Memory of Alfred Nobel, founder of the Nobel Prize. Personally, I love this award since it allows us, as humans, to stand and take stock of what we do to one another, and gives us a chance to put peace in the frame as a discussion point for the world.

The ever-brilliant Guardian Data Store has released data for every peace prize winner. I have taken this data, and visualised it to show the number of prize winners per country.  Here is the image, or click here for a larger image:

Tableau Nobel Prize Winners

To do this, I used Tableau. This tool is excellent because it recognises country names. However, when visualising data, it is important to be careful if you are just relying on country names alone. For the Nobel Peace Prize winners list, this proved problematic because country names can change over time, and Tableau seemed to recognise the most recent country list. For example, Baroness Bertha Sophie Felicita Von Suttner won the award in 1905, and the country was then known as Austria-Hungary; now, it is known as the Czech Republic.

Also, countries can be known by more than one name, which can be difficult for software to negotiate.  For example, Burma is officially known as the Union of Myanmar, and East Timor is officially known as Timor-Leste.In this case, Tableau didn’t recognise ‘Great Britain’ but did recognise ‘United Kingdom’. So, you have to be very careful of your data; the software can only do so much for you, and you need to double check. Fortunately, Tableau has a ‘View Data’ option which allows users to double-check. A simple tip in Reporting Services is to add in a table which shows your base data, and then remove it before you complete the report. This will allow you to double-check that the report is really showing that which you would like it to show.

Additionally, country names can be deeply contentious; just look at the Middle East for some examples there. No software in the world is going to be able to resolve this; you have to be clear about whether you are using one country name or another. For this visualisation, I replaced ‘old’ names with the most up-to-date country names, and replaced the informal names given in the original data  (e.g. Burma) with the official name (Union of Myanmar).

To summarise, if you are using geographically-based display if data, be careful f you are relying on country names.  Be sure to check whether the country name is picked up properly. Tableau and Reporting Services can both use Longitude and Latitude, and Reporting Services 2008 R2 can use SQL Server 2008 Geography and Geometry data types for absolute certainty. More on this at a later point!

Add to Technorati Favorites