Wednesday, November 16, 2011
Best Practices in SSIS frm SQLCAT continued
Best Practices in SSIS frm SQLCAT
SSIS
The perfmon counter that is of primary interest to you is Process / % Processor Time (Total). Measure this counter for both sqlservr.exe and dtexec.exe. If SSIS is not able to drive close to 100% CPU load, this may be indicative of:
- Application contention: For example, SQL Server is taking on more processor resources, making them unavailable to SSIS.
- Hardware contention: A common scenario is that you have suboptimal disk I/O or not enough memory to handle the amount of data being processed.
- Design limitation: The design of your SSIS package is not making use of parallelism, and/or the package uses too many single-threaded tasks.
Tips to improve
- Use the NOLOCK or TABLOCK hints to remove locking overhead.
- To optimize memory usage, SELECT only the columns you actually need. If you SELECT all columns from a table (e.g., SELECT * FROM) you will needlessly use memory and bandwidth to store and retrieve columns that do not get used. .
- If possible, perform your datetime conversions at your source or target databases, as it is more expensive to perform within Integration Services.
If Integration Services and SQL Server run on the same server, use the SQL Server destination instead of the OLE DB destination to improve performance
Commit size 0 is fastest on heap bulk targets, because only one transaction is committed. If you cannot use 0, use the highest possible value of commit size to reduce the overhead of multiple-batch writing. Commit size = 0 is a bad idea if inserting into a Btree – because all incoming rows must be sorted at once into the target Btree—and if your memory is limited, you are likely to spill. Batchsize = 0 is ideal for inserting into a heap. For an indexed destination, I recommend testing between 100,000 and 1,000,000 as batch size.
A key network property is the packet size of your connection. By default this value is set to 4,096 bytes. This means a new network package must be assemble for every 4 KB of data. As noted in SqlConnection.PacketSize Property in the .NET Framework Class Library, increasing the packet size will improve performance because fewer network read and write operations are required to transfer a large data set.
If your system is transactional in nature, with many small data size read/writes, lowering the value will improve performance.
Since Integration Services is all about moving large amounts of data, you want to minimize the network overhead. This means that the value 32K (32767) is the fastest option
Do not sort within Integration Services unless it is absolutely necessary
- Try to perform your data flows in bulk mode instead of row by row. By doing this in bulk mode, you will minimize the number of entries that are added to the log file. This reduction will improve the underlying disk I/O for other inserts and will minimize the bottleneck created by writing to the log.
- If you need to perform delete operations, organize your data in a way so that you can TRUNCATE the table instead of running a DELETE. The latter will place an entry for each row deleted into the log. But the former will simply remove all of the data in the table with a small log entry representing the fact that the TRUNCATE occurred. In contrast with popular belief, a TRUNCATE statement can participate in a transaction.
- Use the SWITCH statement and partitioning. If partitions need to be moved around, you can use the SWITCH statement (to switch in a new partition or switch out the oldest partition), which is a minimally logged statement.
- Be careful when using DML statements; if you mix in DML statements within your INSERT statements, minimum logging is suppressed.