How is it best to handle unaudited Fact Table Updates and Deletes?

It is more difficult to conduct updates and deletes on a fact table, than to simply insert new rows. If the source system handles the updates and deletes, then this is the preferable option since it is already in place. Unfortunately, however, this is not always the case. How is it possible to handle unaudited updates and deletes?
If the changed rows are flagged with a ‘Modified’ date column, then this means we can more easily identify the rows that have been changed. If this isn’t the case, how is it possible to go about identifying which rows have been changed? There are a few options:
Truncate and re-load the fact table at every load – this isn’t a good suggestion because some fact tables may be very frequently updated. This would not provide a good service to users, who rely on the information from the fact table.
Ask your friendly DBA to look at helping you e.g. triggers, replication, or adding a ‘Modified’ date to the source transaction data – this is a better idea. It is better to push the issue ‘upstream’ so that the data is loaded as simply into the new data warehouse as possible. This also allows the opportunity for re-use, which is always good. If you’d like to know more about using replication to find unaudited changes in the fact table, please see Duncan Sutcliffe’s excellent post here.
It’s better to distinguish between ‘soft’ deletes and ‘hard’ deletes.
• ‘Soft’ deletes means that the row is not actually deleted, but simply flagged as deleted.
• ‘Hard’ deletes – it’s gone!
Soft deletes are obviously better; an additional column can be added as a flag, and the flag is set to denote whether the record is deleted or not.
The Kimball methodology argues that updates and deletes should be handled as transactions. However, that isn’t possible in every case, and it’s probably better to apply this strategy for incremental data such as sales. Let’s take a non-incremental example, where a company has phone record data. The phone record may hold information initially such as the start time of the call. However, any ‘on hold’ time or ‘end time’ associated with the call cannot be recorded until the call ends. Thus, the update needs to be done ‘in situ’ or there will be lots of additional rows with empty columns, all for one call.
Although the structure can be difficult to decide, Kimball usually has some good words of wisdom; the trick is knowing when to apply it to your particular circumstance and user requirements, and I do recommend reading Kimball as often as possible!

Add to Technorati Favorites

SSIS and Oracle: challenge of ‘Unable to extend temp segment’ in Oracle

If SSIS is being used to retrieve large amounts of data from an Oracle database, the SSIS package may fail with the following error message:
ORA-01652: unable to extend temp segment by 256 in tablespace
This error basically means that Oracle could not extend the temporary segment for the required number of blocks in the user’s allocated tablespace.  There are a few solutions to this issue:
  • Increase the users temp tablespace
  • Although it’s an Oracle error, it may show up structural design issues with the package itself. Thus, the SSIS package could be changed so that, instead of one large query, the Oracle database is serving up smaller chunks of data at a time. This may mean ‘daisy-chaining’ a number of components, who each serve up smaller parts of the larger query that was initially being executed.
As always, it’s best to have a chat with your friendly Oracle dba owner of the source database for their thoughts.

Add to Technorati Favorites

Creating Oracle Tables using SSIS

It’s possible to use SSIS to create a new Oracle table for you. However, there are a few ‘gotchas’ that it’s necessary to look out for.
It’s possible to use the ‘OLE DB Destination’ adapter to insert data into Oracle.  Thus, if a new Oracle table is required, it’s possible to select the destination Oracle database, and use the ‘New’ button to generate code for you automatically. This will bring up a text editor, which may have some content similar to the following snippet:

“ETLLoadID” VARCHAR2(255),
“Column1” NUMERIC(10),
“Column2” NUMERIC(10)

This will create a table called “MyTable” for you, which initially sounds like good news. When the SSIS package runs, it will deliver the data into the destination Oracle table.
Don’t get comfort from those green SSIS boxes, however. It’s important to be careful with those quotation marks.
Thus, when you try to select from the Oracle table, using the following statement, it won’t work:
select ETLLoadID from MyTable
This does not work, and generates the following error message:
ORA-00904: Invalid Column Name
Then, the select statement does not work; it needs quotation marks around it.  This will work better:
select “ETLLoadID” from MyTable
In order to find out exactly how the table has been implemented, it’s possible to use the DESCRIBE command. This is like SP_HELP, and provides you with data you need for the table.
Instead of using SSIS to generate tables, I now create my own tables in Oracle since it gives me the control I need, and means that I don’t have to worry about quotation marks. If you need a useful link to help you with the syntax, here it is.
I hope that helps!

Add to Technorati Favorites