This information on improving Data Flow originally came from the Microsoft site, but it is intended here to distill the main points to make the Data Flow review more easy to conduct. There are two main ways of improving performance in Data Flows; one is amending the Data Flow property itself, and the other is to review the properties of individual components of the Data Flow.
Configuring Data Flow properties
Item |
Definition |
||||||
BufferTempStoragePath |
locations for temporary storage of buffer data; can make this faster by overriding the default directories (TEMP/TMP) so it points to a faster hard drive, for example |
||||||
BLOBTempStoragePath |
columns that contain binary large object (BLOB) data; can make this faster by overriding the default directories (TEMP/TMP) so it points to a faster hard drive, for example |
||||||
DefaultBufferSize |
Default buffer size is 10Mb; this could be increased to 100Mb |
||||||
DefaultBufferMaxRows |
maximum number of rows in each buffer; the default is 10000, but could be increased |
||||||
EngineThreads |
This setting is a property of the Data Flow Task; simultaneous threads running in parallel. General rule is not to run more threads in parallel than the number of available processors |
||||||
Parallel Execution |
Use MaxConcurrentExecutables and EngineThreads |
||||||
Run Tasks Simultaneously |
MaxConcurrentExcecutables is a property of the package Default = -1 (Physical or Logical Processors + 2) |
||||||
RunInOptimizedMode |
This actively removes unused components such as columns, output in order to increase performance. |
||||||
Adjusting Buffer Size |
Initial buffer size = (estimated the size of a single row of data x DefaultBufferMaxRows )
|
Configuring the Data Flow Components
The following settings refer to the components of the Data Flow task, since it is also possible to optimise these individual elements.
Item |
Description |
||||||||||
Optimise Queries |
Revise queries to retrieve only the required columns i.e. revise select * from … note that RunInOptimizedMode can remove unnecessary columns if required. |
||||||||||
Avoid unnecessary Sort operations |
Could perhaps revise Sort upstream
|
||||||||||
Source |
SQL statement performs better than ‘table or view’ retrieval |
||||||||||
Transformation |
|
||||||||||
Destinations |
SQL Server destinations are better if you are loading to the same computer Check whether the destination is slower, or the package – use Row Count as a temporary substitute to see what happens |
To add this post to Twitter, please click here.Hope that helps a bit!
Jen x