Improving Data Flow Performance in SSIS 2008

This information on improving Data Flow originally came from the Microsoft site, but it is intended here to distill the main points to make the Data Flow review more easy to conduct. There are two main ways of improving performance in Data Flows; one is amending the Data Flow property itself, and the other is to review the properties of individual components of the Data Flow.

Configuring Data Flow properties

Item

Definition

BufferTempStoragePath

locations for temporary storage of buffer data; can make this faster by overriding the default directories (TEMP/TMP) so it points to a faster hard drive, for example

BLOBTempStoragePath

columns that contain binary large object (BLOB) data; can make this faster by overriding the default directories (TEMP/TMP) so it points to a faster hard drive, for example

DefaultBufferSize

Default buffer size is 10Mb; this could be increased to 100Mb

DefaultBufferMaxRows

maximum number of rows in each buffer; the default is 10000, but could be increased

EngineThreads

This setting is a property of the Data Flow Task; simultaneous threads running in parallel. General rule is not to run more threads in parallel than the number of available processors

Parallel Execution

Use MaxConcurrentExecutables and EngineThreads

Run Tasks Simultaneously

MaxConcurrentExcecutables is a property of the package

Default = -1 (Physical or Logical Processors + 2)

RunInOptimizedMode

This actively removes unused components such as columns, output in order to increase performance.

Adjusting Buffer Size

Initial buffer size = (estimated the size of a single row of data x DefaultBufferMaxRows )

Initial buffer size > DefaultBufferSize

Number of rows reduced

Initial buffer size < Minimum Buffer Size

Number of rows increased

(Initial buffer size > Minimum buffer size) & (Initial buffer size < default buffer size)

(estimated row size x the value of DefaultBufferMaxRows)

  • Reduce row size by removing unnecessary columns and use appropriate data types
  • Use a smaller number of large buffers, rather than a larger number of small buffers; review DefaultBufferSize and DefaultBufferMaxRows while reviewing performance and the BufferSizeTuning event.
  • Be careful of increasing buffer size too much, in case paging starts to occur. Check the Buffers Spooled performance counter

Configuring the Data Flow Components

The following settings refer to the components of the Data Flow task, since it is also possible to optimise these individual elements.

Item

Description

Optimise Queries

Revise queries to retrieve only the required columns i.e. revise select * from … note that RunInOptimizedMode can remove unnecessary columns if required.

Avoid unnecessary Sort operations

Could perhaps revise Sort upstream

  • Set the IsSorted property on the output of an upstream data flow component to True.
  • Specify the sort key columns on which the data is sorted.
  • Reduce sort operations in number e.g. use multicast rather than two Sorts later on

Source

SQL statement performs better than ‘table or view’ retrieval

Transformation

Aggregate Transformations

Keys, KeysScale, CountDistinctKeys, and CountDistinctScale
properties can help in memory preallocation

One Aggregation is better than multiple transformations.

Fuzzy Lookup and Fuzzy Grouping

Lookup

Select only the required columns, not select *

Merge Join Transformations

MaxBuffersPerInput – fine-tune memory by indicating the max number of active buffers for each input; default is 5 but SSIS may temporarily increase this value

SCD

OLE DB that uses UPDATES are slower; it’s better to use Execute SQL with updates

Destinations

SQL Server destinations are better if you are loading to the same computer

Check whether the destination is slower, or the package – use Row Count as a temporary substitute to see what happens

To add this post to Twitter, please click here.Hope that helps a bit!
Jen x

Leave a Reply