Optimising Aggregations in Analysis Services 2008

In Analysis Services 2008, query and process performance can be optimised by looking at aggregations. I’ve distilled the main points from the SQL Server Best Practices guide in order to obtain the main points, and added in some other details. To add this post to Twitter, please click here.

An aggregate is a data structure that contains a pre-calculated summary of data. The aggregation size is smaller than the fact data it is based on, so it is expected that this will be quicker to query. Thus, aggregation design is an important method of optimising performance.

However, it is important to balance this benefit against the cost of updating the aggregations at processing time. Aggregations designs add metadata, and can thus impact on the performance. Thus it is sensible to have a balanced approach to aggregation designs per measure group; not too many! The number of potential aggregations is very high, so it would take a lot of space to contain all aggregations; thus, it is important to be selective when creating aggregations. It is not necessary to build lots of aggregations to get a query performance increase. A query can be resolved quickly since it can use existing aggregations, and combine them, to obtain the query result. It isn’t possible to aggregate everything e.g. country code. It is possible to set IsAggregatable to False, but this will affect all of the default members; so this step should be taken with care.

There are two main types of aggregations; rigid and flexible. Rigid aggregations are preserved as a result of dimension update; flexible aggregations are re-calculated. It is possible to force flexible aggregations to be rebuilt by using ProcessAffectedObjects when processing the dimension, or you can use the ProcessIndex command.

Aggregations can be created manually, or it is possible to run the Aggregation Design Wizard from BIDS or SSMS. It uses an estimated count of the members of all attributes of dimensions, in the corresponding measure group. If the EstimatedCount property is not set, then SSAS will query the relational database; this may increase the time taken for the wizard to run, particularly if the dimensions are relatively large in terms of attributes.

It is possible to tag attributes as a candidate for aggregation. This is done using the AggregationUsage property. Before you do this, you should look at the hierarchy of attributes in order to identify whether the hierarchies can be optimised. If the dimension has natural hierarchies, then it is possible to include only one attribute from the same dimension to create the necessary aggregations; for example, you could just include Product SubCategory, and this would be rolled up to give ProductCategory to prevent ProductCategory from becoming redundant. If the hierarchy of attributes is not natural, then you may need more aggregations to cover off the potential space of queries and it isn’t possible to re-use the smaller atomic aggregations. Further, you would need to include more than one single attribute to obtain the aggregation; thus, more aggregations may be required in order to optimize performance.

The AggregationUsage property applies to attributes, and identifies the activity that the Aggregation Design wizard can take on each attribute. It can take one of four values:

Value

Description

Full

Every aggregation should include

(a) this attribute or

(b) a related attribute, which is lower down in the attribute chain.

Unrestricted

The aggregation designer will consider this attribute to see if it is eligible for aggregation.

None

No aggregation applied to this attribute.

Default

Default rule applied, based on type of attribute and dimension. There are four rules:

Default Constraint 1

Unrestricted

Default Constraint 2

None for Special Dimension Types – data mining, many-to-many

Default Constraint 3

Unrestricted for Natural Hierarchies – unrestricted unless the attributes are nonaggregatable, in which case this property is set to Full

Default Constraint 4

None

Tips:

  • Set the AggregateUsage property to Unrestricted only for the frequently-used attribute hierarchies: The Best Practices paper suggests five – ten Unrestricted attributes per dimension.
  • If the attribute is contained in most of the queries, then set AggregateUsage should be set to Full. This only applies to attributes where there are only a few members; otherwise, the wizard will take a long time to devise a design, and implement it.
  • If the attribute isn’t used often, then this should be set to None. This reduces the overall search space.

Aggregations are partition-specific, since they are data structures stored on disk. The AggregationDesign object can be shared by different partitions. It is also possible to use a base partition which does not contain any data, but does contain the aggregation design. When a new partition is required, a clone of the base can be taken, and then fully processed, and will contain the new data and corresponding aggregations.

Another Wizard which can help is the Usage-Based optimisation wizard, which can be run via SSMS. The wizard works by evaluating the queries in the query log, identifies the hundred queries that performed the slowest, and devises aggregations to resolve these. It works in a top-down fashion; it starts with grand totals, and moves down to lower grain aggregations. If the wizard is configured to design aggregations to give a 100% performance again, then the aggregations will be devised to avoid directly hitting the partition. The resulting aggregations may be identical to existing aggregations, so the user will need to double-check for duplicates. In order for it to work, it is necessary to set up server query logs:

QueryLogSampling

Every 10th query is logged, but this is configurable

QueryLogConnectionString

The SQL Server holds a query log for SSAS

QueryLogTableName

This holds the tablename for holding the log

CreateQueryLogTable

This creates a new table, or logs to the existing one

How is it possible to measure the optimisation value of aggregations? It is possible to use SQL Profiler to identify which aggregations are used, by making use of the event GetDataFromAggregation. Examine statistics and metadata about the cube, such as member counts or fact table counts. The design optimisation algorithm will use this information when suggesting aggregations. If there is no aggregation, then the storage engine goes to the detailed data and then executes the appropriate MDX.

In the profiler, each dimension is named by a vector, which identifies whether the attribute refers to the All attribute, or the attribute itself. The All level is identified by 0, and the attribute level is identified by 1. For example, the following subcube definition (0000, 0001, 0001) describes an aggregation for the following:

  • Product – All, All, All, All
  • Geography – All, All, All, State/Province
  • Order Date – All, All, All, Year

Each aggregation is identified by a subcube. This is a vector path, which is an amalgamation of the dimension vectors, which are comma-separated. The vectors are combined in dimension order, which can be found in the XMLA definition of the cube, or by looking under the Dimensions pane in the Cube Structure.

To summarise, in Analysis Services 2008, query and process performance can be optimised by looking at aggregations; there are many different features around Aggregations in 2008, so it is worthwhile having a look at them.

Leave a Reply