Fastest Data Loading using Bulk Load and Minimal Logging

Understanding Bulk Load Operations

There are two types of INSERTs in SQL server, first is singleton INSERT which inserts rows one by one and second type is BULK INSERT which feeds a continuous stream of data into a table, so this type of INSERT is much faster than singleton inserts.
Supported methods:

  • BULK INSERT – For CSV or fixed width files.
  • BCP – Like BULK INSERT, but can be run remotely.
  • SSIS – SQL Server Destination and OLEDB Destination
  • SELECT * INTO (you can’t control which file group the destination table goes to (before SQL Server 2016))
  • BULK API (Custom code in C#)

Understanding Minimally Logged Operations

Some operations in SQL Server requires to only log the fact that the operation occurred and it’s space allocations (extent allocations), not individual rows or pages affected by that operation, these types of operations are called Minimally Logged Operations. These operations are much faster when executed in BULK LOGGED recovery than in FULL recovery.
With minimal logging SQL Server will force data pages to be flushed to disk before the transaction commits. So if the IO system for data file is not very fast, or the data pages affected by the operation are not sequentially arranged, you might see worse performance with minimal logging.
Log file growth will likely be smaller, but LOG BACKUPS will not be smaller.

Example includes

  • BULK LOAD operations (when done right)

(Note: Minimally logged operation can participate in a transaction.)

Improving DATA LOAD performance using TABLOCK hint

1) Minimal Logging can make bulk load operations more efficient and minimize the risk that the transaction log will fill up. To minimally log a bulk load operation, the TABLOCK hint must be specified.
2) Another way to improve performance in your bulk load operations is to specify Table Locking for the duration of the data load. This is also achieved by including the TABLOCK hint with your SQL statement. This hint issues a bulk update (BU) lock on the table and this lock prevents operations that are not related to bulk loading data from accessing the table and allows other (BU) locks to co-exists together.
3) The TABLOCK hint is also useful if you want to perform Parallel Data Load into a single table. However, you should use TABLOCK for parallel operations only if the table is unindexed (is a HEAP). If indexed table is loaded with data using TABLOCK we will get exclusive (X) lock which will not allow any other transaction to do anything thus hindering parallelism.

Conditions for MINIMALLY LOGGING of BULK LOAD operations.

  • HEAP(both empty and non empty) – Data pages can be ML
  • Empty HEAP with NCI – Data pages and Index pages can be ML
  • Non Empty HEAP with NCI – Data pages can be ML and Index pages will be FL
  • Empty CI with NCI – Data pages and Index pages can be ML
  • Non Empty CI with NCI – Data pages and Index pages will be FL

In nut shell Loading into Heaps: Use TABLOCK as Minimally Logging, Parallel Load and Table Level Locking are all achieved with TABLOCK hint. While Loading data into Clustered Index: Do NOT use TABLOCK, use trace flag TF610 as Minimally Logging is achieved using TF610 and Parallel Load can be done for non overlapping input streams, see Bulk Loading data into HEAP versus CLUSTERED Table for more detail

Loading into Clustered Columnstore Index: Do NOT use TABLOCK, see Bulk Loading data into COLUMNSTORE table for more detail