Dimensional Modelling: application to the life of King Henry VIIIth

Here, I will discuss the various types of Slowly Changing Dimensions have been discussed using the example of the marital activities of King Henry VIIIth, along with various suggestions of how the SCD technology might be used to implement these technologies using Microsoft SSIS 2008.

When we model environments, how is it possible to record a piece of information when characteristics about that fact are subject to change? The current article hopes to clarify this issue, by using an example: King Henry VIIIth (28 June 1491 – 28 January 1547) and his marital history. King Henry the eighth is mostly remembered for having six wives, and then disposing of them in different ways. People often assume that he divorced five times and his sixth wife outlived him. However, that’s not strictly true; four marriages were annulled, and the remaining two ended in death. The specific details are listed here:

Consort

How did the Marriage end?

Catherine of Aragon

Annulled

Anne Boleyn

Annulled then beheaded

Jane Seymour

Died in childbirth

Anne of Cleves

Annulled

Catherine Howard

Annulled then beheaded

Catherine Parr

Outlived King Henry VIIIth

Considering that King Henry VIIIth is someone who had a complexity of different marital status at various times, how could this behaviour tracked in a data warehouse? In more general terms, how can dimensional modeling cope with changes such as these?

Data about facts are held in dimension tables. The ‘look up’ data is held in dimensional tables, and the actual facts themselves are held in a fact table. How can dimensional tables be configured to cope with data or ‘real life’ changes?

Dimensional modeling implements a methodology called ‘Slowly Changing Dimensions’ or SCD in order to accommodate ‘real life’ changes about facts. This article also briefly describes how SCD methodology is implemented in Microsoft SSIS 2008.

There are three main types of commonly used SCD: 1, 2, and 3. Each type of SCD tries to allow the dimensional designer to cope with changes.

The Type 1 methodology simply overwrites old data with new data. Historical data is not tracked at all. This can be used when data errors are being corrected, for example, if an attribute was mis-spelled. One criticism of this rationale for SCD is that it does not allow the state change reversal to the original state. In dimensional modelling, this means that the original information is lost. This destructive behaviour on the original record may not be suitable for many business environments, which may need to preserve the history of the state over time. In Microsoft SSIS, the Type 1 implementation of the Slowly Changing Dimension transformation directs these rows to an output named Changing Attributes Updates Output.

In our example of King Henry VIIIth, a Type 1 methodology would hold that, at the time of his death, he was married and that no other history exists. However, for many purposes, this might be insufficient. For example, what about the case where we would like to track the history regarding the King Henry VIIIth’s marital status?

In a Type 2 methodology, in terms of data warehouses, existing records cannot be changed. However, it is possible to allow an update to show that the record is current or expired. A change in an attribute results in the creation of new records, which have a primary key. Thus, a new record is created as a result of an attribute change as opposed to the update of an existing record. Since a new record is inserted each time a change is made, then the history is preserved. For example, a customer could have an address in Oxford in one year, and an address in Cambridge the next. Although the customer is only one person, according to the dimensional model the customer would have two different records, one for Oxford and the other for Cambridge.

Briefly, a new situation or record is created to correspond to the new state. In terms of the data warehouse, this means that we would have to create a key that ties the multiple instances together. This key is known as the Business key. In Type 2 SCD, each record shares the same Business Key, which ties the events together, even though they have a different Surrogate Key. In terms of future ‘worlds’, future events against a dimension record will be owned by the latest dimension record that changed; so a relation will exist. The fact record would be recorded against the latest dimension value, while historic records would link to the old previous record. Thus, a data update occurs, not a data overwrite.

In SSIS, the Type 2 implementation of Slowly Changing Dimension transformation directs these rows to two outputs: Historical Attribute Inserts Output and New Output. Fixed attribute changes indicate the column value must not change. The Slowly Changing Dimension transformation detects changes and can direct the rows with changes to an output named Fixed Attribute Output.

In our example of the marital complexities of King Henry VIIIth, the history would be tracked by a Type 2 Slowly Changing Dimension. The effective start and end dates would keep the history of the changes in marital history, which would be preserved.

In a Type 3 methodology, this involves a combination of the previous two interpretations. The Type 3 SCD can be expanded to include Type 2 functionality. At its lowest level, it is a case of duplicating attributes to have a ‘Current’ and ‘Previous’ version, and then using logic to set values where needed. In other words, the only outcome that is considered is the outcome that actually took place; this can be called the only true or real outcome.

This might be useful for tracking both the previous state and the existing state. For example, if the report consumer wanted to know the current status and the previous status, then this might be useful. However, this is rarely the case. In our example of King Henry VIIIth, Type 3 SCD would be used if we wanted to know his current state and his previous state. However, this isn’t necessary for most purposes, which is why Type 3 isn’t used often.

According to Type 3 SCD, the original value is overwritten and the new effective start date is added. The old record is changed, which is denoted by an updated to a column to specify the effective start date. We don’t have to worry about all changes. In fact, we are only concerned with the facts on which we are trying to report.

Microsoft SSIS does not support Type 3 changes because it involves changes to the dimension table. However, it is possible to capture the data values that are candidates for Type 3 changes by identifying columns with a fixed attribute update type.

Microsoft SSIS does permit the implementation of Type 1 and Type 2 SCD, but care needs to be taken with the implementation of Type 3 methodology. Microsoft SQL Server SSIS 2008, via package logging, does permit the additional logging of calls to the SCD transformation, which allows increased capacity to troubleshoot connections, commands and queries.

To conclude, the various types of Slowly Changing Dimensions have been discussed using the marital shenanigans of King Henry VIIIth, along with various suggestions of how the SCD technology might be used to implement these technologies.


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s