Indexing: Fact and Dimension guidelines

Indexing is part of the art of optimising the database structure. However, there are no real ‘hard and fast’ rules since it depends, ultimately, on query use.  Here are some suggestions below:

Dimension tables

  • Dimension tables should have a clustered primary key index. This assumes that there is a surrogate primary key as per Kimball’s advice.
  • Dimension tables may have an index on the natural business key.
  • Dimension tables may also have a single-index index on the foreign keys
  • Dimension tables may have non-key indexes added to hierarchical relationships.

Caveats

It will be important to gauge the size of the dimension, and its performance when adding indexes on the natural business key, and the foreign keys. It’s a question of size. If the dimensions are small, it may not be worthwhile; on the flip side, it is important to look at the query plan for optimisations.

Fact Tables

  • If the fact tables are partitioned, add a single-column index to the date key on the partition.
  • If the fact tables are not partitioned, the date key is a good place to start to add a single-column index. 

Fact Tables and Dimension Tables together

You can add a foreign-key reference between the fact tables and the dimensions; this will help to maintain referential integrity within the database itself. However, this can be expensive in terms of processing, since this will be checked every time the ETL load inserts a row into the fact table. If your fact table is large, and there is a substantial amount of data to be inserted, then this could take a long time.  In the fact/dimension world, this check is already done when the lookup of the fact table takes place – so you may not need to repeat it.

Recommendation: Testing and more testing! Evaluate your query plan and check out the INSERT part in particular.

Hope that helps someone!

Add to Technorati Favorites

Leave a Reply