Adventures in SQL: Coalesce versus ISNULL

Some of the differences between ISNULL and COALESCE are quite obvious. For example, a column that’s been created by ISNULL is not regarded as NULLABLE. On the other hand, a column created by COALESCE is regarded as nullable. Similarly, ISNULL only takes two arguments, whereas COALESCE can accept more than two arguments. It is possible to nest ISNULL statements, however, so this can be used to meet a requirement to use ISNULL rather than COALESCE.
There is one difference between ISNULL and COALESCE, however, that may not be so obvious. There is a restriction with ISNULL, where the length of the first parameter dictates the data type of the resulting column. Thus, if the second argument is longer than the datatype of the first argument, it will be truncated.
Here is an example using the AddressLine2 column of the Person.Address table in the AdventureWorks2008 database, which has a datatype of nvarchar(60):

SELECT
ISNULL(AddressLine2,
‘1234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890’
)  as ISNULL_Result
FROM [AdventureWorks2008].[Person].[Address]
SELECT
COALESCE(AddressLine2,
‘1234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890’
) As COALESCE_Result
FROM [AdventureWorks2008].[Person].[Address]

Using ISNULL, the resulting column is reduced to having a length of 60. Using COALESCE, however, no such restriction is in place. Thus, using the above query, the length of the second argument is retained, resulting in a length of 70. This can be seen in the image below:

ISNULL versus COALESCE result

It is worth noting that, if COALESCE isn’t used properly, it can provide strange results. For example, if the datatypes are far apart, such as an integer and a datetime, COALESCE will return a datetime. So, the following SQL command:
SELECT COALESCE(100, GETDATE()) as Coalese_Result
This results in the following:
‘1900-04-11 00:00:00.000’
Since datetime has been picked as the higher priority datatype for the resulting, the integer value of 100 has been converted to a datetime. COALESCE is behaving as it was designed to do, which is to pick the higher-priority data type. Just be careful of unexpected results if the datatypes are mixed: test, test and test again!
To summarise, COALESCE can be a useful SQL tool to have in your back pocket, if it is used and tested properly.

Plastic Surgery Data Analysis Using PerformancePoint

One area of interest in healthcare data is plastic surgery, and the trends that appear in the operations that individuals are choosing to have done. In order to explore more about this phenomenon, the Guardian Datastore has provided an excellent set of data which has been used here as a basis for further analysis, and the display was created using Microsoft Office PerformancePoint and SQL Server.
As the examples show, sometimes it’s not enough just to have a set of numbers; illustrating the numbers with graphs can highlight patterns to the data consumer that weren’t previously visible.
In order to display some of the capabilities of Performance Point, the plastic surgery data was used here in order to create graphs and dashboards using Microsoft SQL Server as a data source. The data was obtained from the Guardian Datastore has a comma-separated values document, or CSV. This was easily imported into a table in SQL Server using a straightforward wizard. The ‘presentation layer’ was Performance Point, which is part of the Office SharePoint suite of products. Performance Point Services is integrated into Sharepoint, and can take in a variety of data sources, including Excel, standard ODBC connections, SQL Server database sources. It is particularly good with Analysis Services cubes since this allows drill-down and click-through to display data in dashboards, graphs, or even ‘traffic light’ indicators. Performance Point aids analysis by allowing the user a lot of control over the appearance of the graphs, and, as is usual for Microsoft, there are lots of different ways to do the same thing. Whatever is easiest for the user, really! If you would like to know more about how everything hangs together, please see the PerformancePoint Services blog site here for a downloadable Visio or PDF diagram for easy reading.
As a sample, there is a PerformancePoint dashboard below, or click here for the large image:
Overall, the dashboard shows that plastic surgery is a growing area of healthcare, with increases shown in almost every type of surgery except the bottom two: facelifts and abdominoplasty (tummy tucks). In addition to an overall increase in the plastic surgery requested, the type of plastic surgery requested has changed from 2008 to 2009. For example, there has been an over 40% increase in brow lifts, and a quarter rise in rhinoplasty or ‘nose jobs’. However, perhaps surprisingly, the ‘tummy tuck’ operation has shown a nearly 10% drop in operations performed. This is highlighted by the following excerpt of the above graph, or click here for an enlarged version:

There is obviously still a difference in the types of surgery requested by males and females, and this can be seen from the graph below. For example, in females, breast augmentation was a popular choice, whilst the data shows that only 28 males have undergone breast augmentation surgery in 2009. The only choice nearing a 50-50 split was otoplasty, which is an operation to correct ear shape or size. This can be seen from the bottom half of the dashboard, which is reproduced here for ease of use:
One interesting find is that male cosmetic surgery is on the increase; in fact, it increased by 21% in the last year. In particular, the Guardian reports that plastic surgery operations to reduce breast size in men (gynecomastia) have nearly doubled in the last year, and the British Association of Aesthetic Plastic Surgeons (BAAPs) found that operations to correct gynecomastia in men grew by 80% overall over the past 5 years.
To summarise, there are plenty of ways of visualising the nuggets of data held in data sources, and the Microsoft stack allows plenty of different SQL Server based ways for storing and visualising data in accordance with user requirements.

Conditional Split Transformation in SSIS 2008: Uses and Examples

One very useful Control Flow feature in SSIS 2008 is the ‘Conditional Split’. Sometimes new SSIS users can have difficulties in finding out detail about the expressions used in the Conditional Split. The purpose of this blog is to provide some advice about the Conditional Split: its use and how it works, when to use it, and some examples of the expression language. For people who are new (and not-so-new!) to SSIS it can sometimes be difficult to decide which component to use, and it isn’t always immediately clear why the Conditional Split component is useful.

How does the Conditional Split transform work? Briefly, as a Data Flow component, the Conditional Split is a decision maker. The component tests each row, and then decides what its destination should be. In other words, the ‘Conditional Split’ tests each individual inbound row of data against a series of different conditions, which have been defined by the package creator. Then, the Conditional Split component funnels each row of the data up into different outputs, dependent on the result of the test against each condition.

In order to understand its purpose a bit better, here are some examples where it might be useful:

Directing ‘bad’ rows towards a separate place for further examination. A recent example is where timesheet data has been imported from one source to a different destination. The source system permitted timesheets to be submitted without customer name, project name, or submission date; in contrast, the target system regards these timesheets as incorrect. In order to ensure that the import from source to target went as smoothly as possible, the records were filtered for NULLS in the client, project or submission date fields and directed to a separate location for review. I could then take these output files, and ask the customer for default values for these fields, for inserting at a separate time.

Differentiating records between ‘new’ and ‘update’ records. ‘New’ records can be inserted into the appropriate place; ‘update’ records can have fields updated in the existing data, rather than inserted as new information. To divide data rows into ‘new’ rows and ‘update’ rows for differentiating between ‘new’ information and information that might need to be updated. For example, if you have a ‘new’ customer, then you may want to treat that differently from situations where an existing customer simply needs an updated profile. This is called ‘Slowly Changing Dimensions’ (SCD).

When should the Conditional Split component be used, instead of the Slowly Changing Dimension component? This is an important point, and it depends on the quality of the data. Note: the SCD component does not handle NULLs. It is possible to use the SSIS 2008 SCD to handle data with NULLs in it, so, in this case, the Conditional Split component should be used.

Prior to creating the package, it is possible to test the data for NULLs by using the Data Profiling Feature in SSIS, and it is recommended here that this investigation is done prior to writing the package. If the data does contain NULLS, then it is recommended to use the Conditional Split component. Choosing the correct and most efficient component for a particular purpose is half the battle; in the words of Mary Poppins, ‘well begun is half done’!

In order to set up the conditions, it is necessary to use the SSIS Expressions language, which applies to the various transformations within SSIS. In the Conditional Split transformation, it is only possible to use expressions that produce a TRUE or FALSE result. So, for example, the following expressions would be legal:
Column 1 >= Column 2
Column 1 < Column 2
ISNULL(Column 1)

Implicitly, this means that the Conditional Split transform can have more than one output destination. Due to this, it is important to consider what the ‘default’ situation will be, and where the ‘default’ rows should be directed. Here is an example of the Conditional Split transformation, using the ISNULL expression as an example, please click here for a larger example:
Conditional Split Expression Example
To conclude, here are some valuable tips in using the Conditional Split transformation:

If the Conditional Split component isn’t working properly, then it is possible to ‘daisy chain’ them in order to clarify the output. The ‘daisy chain’ method can help since it is possible to use Data Viewers in order to actually see what data is going through each flow. This mechanism can illuminate the decision behind the decisions made by the Conditional Split.

If the expressions do not result in TRUE or FALSE outcomes, then the Conditional Split component will not work properly. Examples of expressions in this category include increment statements, e.g. @Counter = 1, and other valid expressions which do not return TRUE or FALSE, such as GETDATE().