Wednesday, November 16, 2011

Best Practices in SSIS frm SQLCAT continued

You should be cautious with SSIS logging. It carries a large I/O overhead and can slow application performance dramatically. Unless absolutely required I would limit SSIS logging t
o OnError and OnTaskFail and OnWarning events


Always consider the ordering of outputs within, for example, a Conditional Split transformation. The first output stream should be used to handle the common cases that apply to the most rows in the data flow. Specify the order in which the conditions are evaluated. Order is significant, because a row is sent to the output corresponding to the first condition that evaluates to true. Handle the exceptions after the common cases.

Increase the values of the DefaultBufferMaxSize and DefaultBufferMaxRows properties of Data Flow tasks. This serves to reduce the number of buffers moving through the data flow. Be careful however, that this does not lead to a situation where the data flow engine is swapping buffers to disk, as this would be counter-productive.
Implement parallel execution. Experiment with the MaxEngineThreads and MaxConcurrentExecutables properties of packages to optimise throughput in SMP architectures. Test throughput and reset values accordingly.

Best Practices in SSIS frm SQLCAT

SSIS

http://sqlcat.com/sqlcat/b/top10lists/archive/2008/10/01/top-10-sql-server-integration-services-best-practices.aspx

The perfmon counter that is of primary interest to you is Process / % Processor Time (Total). Measure this counter for both sqlservr.exe and dtexec.exe. If SSIS is not able to drive close to 100% CPU load, this may be indicative of:

  • Application contention: For example, SQL Server is taking on more processor resources, making them unavailable to SSIS.
  • Hardware contention: A common scenario is that you have suboptimal disk I/O or not enough memory to handle the amount of data being processed.
  • Design limitation: The design of your SSIS package is not making use of parallelism, and/or the package uses too many single-threaded tasks.

Tips to improve

  • Use the NOLOCK or TABLOCK hints to remove locking overhead.
  • To optimize memory usage, SELECT only the columns you actually need. If you SELECT all columns from a table (e.g., SELECT * FROM) you will needlessly use memory and bandwidth to store and retrieve columns that do not get used. .
  • If possible, perform your datetime conversions at your source or target databases, as it is more expensive to perform within Integration Services.

If Integration Services and SQL Server run on the same server, use the SQL Server destination instead of the OLE DB destination to improve performance

Commit size 0 is fastest on heap bulk targets, because only one transaction is committed. If you cannot use 0, use the highest possible value of commit size to reduce the overhead of multiple-batch writing. Commit size = 0 is a bad idea if inserting into a Btree – because all incoming rows must be sorted at once into the target Btree—and if your memory is limited, you are likely to spill. Batchsize = 0 is ideal for inserting into a heap. For an indexed destination, I recommend testing between 100,000 and 1,000,000 as batch size.

A key network property is the packet size of your connection. By default this value is set to 4,096 bytes. This means a new network package must be assemble for every 4 KB of data. As noted in SqlConnection.PacketSize Property in the .NET Framework Class Library, increasing the packet size will improve performance because fewer network read and write operations are required to transfer a large data set.
If your system is transactional in nature, with many small data size read/writes, lowering the value will improve performance.

Since Integration Services is all about moving large amounts of data, you want to minimize the network overhead. This means that the value 32K (32767) is the fastest option

Do not sort within Integration Services unless it is absolutely necessary

  • Try to perform your data flows in bulk mode instead of row by row. By doing this in bulk mode, you will minimize the number of entries that are added to the log file. This reduction will improve the underlying disk I/O for other inserts and will minimize the bottleneck created by writing to the log.
  • If you need to perform delete operations, organize your data in a way so that you can TRUNCATE the table instead of running a DELETE. The latter will place an entry for each row deleted into the log. But the former will simply remove all of the data in the table with a small log entry representing the fact that the TRUNCATE occurred. In contrast with popular belief, a TRUNCATE statement can participate in a transaction.
  • Use the SWITCH statement and partitioning. If partitions need to be moved around, you can use the SWITCH statement (to switch in a new partition or switch out the oldest partition), which is a minimally logged statement.
  • Be careful when using DML statements; if you mix in DML statements within your INSERT statements, minimum logging is suppressed.