Partitioning Summary: SQL Server 2005 vs SQL Server 2008

Sometimes, I hear some refusals to upgrade from SQL Server 2005 to SQL Server 2008 on the grounds that ‘2005 is working fine, so we don’t need to upgrade – I’ll keep this ticking over until it fails’.That might be suitable for some environments, but what happens in the case where the database is growing quickly? A bloated fact table, for example? The problems of a large fact table are the worst type: they are obvious to end users since it becomes unwieldy to load and unload data, and the queries just take too long. In today’s world of instant results, people need answers asap.

It is worth considering partitioning as a measure to help navigate your way through this issue, and this post aims to provide a brief summary of the benefits. Firstly introduced in SQL Server Enterprise Edition 2005 (not Standard), partitioning can help to make large tables quicker and easier to access. Essentially, table partitioning is a design feature that assists in ensuring that large tables are navigated, loaded and matured in a way that is optimised for performance whilst being easier to administer. Candidate tables for partitioning are those which contain lots of data, or the table is not performing, or it is becoming difficult to maintain.

In SQL Server, partitioning a large table means dividing the table and its indexes into smaller segments. This means that administrative operations can be applied on the smaller partition, rather than on the entire table. Further, the query optimizer can mobilize the query to navigate appropriate partitions of the table. To summarise, this means that partitioning can really help to optimize the table!

Table partitioning is horizontal – imagine a striped table, with groups of rows associated with each stripe – or partition. These strips, or partitions, can be spread across filegroups. Partitioning can be managed separately. For example, data sets can be split data into staging areas, taken offline for dedicated maintenance, and then re-added to partitioned tables – how good is that? Back to the main point, though – why is partitioning better in SQL Server 2008? Here is a summary of reasons:

Locking Resolution – SQL Server partitioning functionality works with the underlying SQL Server engine to introduce an intermediate stage of locking – locking is taken to the partition level, and then escalated the table level. This means that locks can be captured before they have a real impact.

Parallel Processing – SQL Server 2008 can use parallel processing for querying partitioned tables, rather than using just one processor for each partition. Thus, queries ran against partitioned tables are optimised to run faster and more efficiently, using the processor resources available. More information can be found here.  Also partitioned queries can be optimised by bitmap filters, which can be dynamically initiated by the SQL Server Query Optimiser. Note that the bitmap filtering is based on using a table that is partitioned on an integer, simulating a date.

Indexed views – in SQL Server 2008, can follow the partitioning scheme of the underlying tables. If it is aligned correctly, the indexed view does not need to be dropped before it is switched out of the partitioned table – as is the case with SQL Server 2005, where materialized data had to be dropped and created again at each switch. 

To summarise, moving from SQL Server 2005 to 2008 is worth considering, in order to take advantage of the new partitioning features. However, it’s always worth ensuring that it’s being implemented properly and for the right reasons, so it’s worthwhile doing your research first. Brent Ozar wrote a great blog on this a while ago, and it’s worth a look.
I hope that helps!

Indexing: Fact and Dimension guidelines

Indexing is part of the art of optimising the database structure. However, there are no real ‘hard and fast’ rules since it depends, ultimately, on query use.  Here are some suggestions below:

Dimension tables

  • Dimension tables should have a clustered primary key index. This assumes that there is a surrogate primary key as per Kimball’s advice.
  • Dimension tables may have an index on the natural business key.
  • Dimension tables may also have a single-index index on the foreign keys
  • Dimension tables may have non-key indexes added to hierarchical relationships.

Caveats

It will be important to gauge the size of the dimension, and its performance when adding indexes on the natural business key, and the foreign keys. It’s a question of size. If the dimensions are small, it may not be worthwhile; on the flip side, it is important to look at the query plan for optimisations.

Fact Tables

  • If the fact tables are partitioned, add a single-column index to the date key on the partition.
  • If the fact tables are not partitioned, the date key is a good place to start to add a single-column index. 

Fact Tables and Dimension Tables together

You can add a foreign-key reference between the fact tables and the dimensions; this will help to maintain referential integrity within the database itself. However, this can be expensive in terms of processing, since this will be checked every time the ETL load inserts a row into the fact table. If your fact table is large, and there is a substantial amount of data to be inserted, then this could take a long time.  In the fact/dimension world, this check is already done when the lookup of the fact table takes place – so you may not need to repeat it.

Recommendation: Testing and more testing! Evaluate your query plan and check out the INSERT part in particular.

Hope that helps someone!

Add to Technorati Favorites