Generally, the fastest way to load data into SQL Server table is by Bulk Loading data into HEAP using  http://rehabinflorida.net/ TABLOCK hint on destination table which facilitates minimally logging. For more details see Fastest Data Loading using Bulk Load and Minimal Logging

• The database must be in simple or bulk logged recovery model.
• The destination table must be without Clustered Index.
• There must be no NON-Clustered index on the destination table.

So if there are indexes present on the table then drop or disable indexes before load and then (re)create indexes after data load. Now we will see example of loading data into heap table from a flat file using BULK INSERT.

/* 1) Naive (Fully Logged) */
BULK INSERT HeapTable
FROM 'C:\path\to\file.txt'

/* 2) Minimal logged */
BULK INSERT HeapTable
FROM 'C:\path\to\file.txt'
WITH (TABLOCK);


To parallel load using T-SQL INSERT command we have to consider few points: Select part in the execution plan is parallel in both INSERT .. SELECT and SELECT * INTO but Insert part is parallel in SELECT * INTO  starting from SQL 2014 and above and Insert part is parallel in INSERT .. SELECT starting from SQL 2016 and above.

/* 3) Minimal logged and Parallel */
INSERT INTO HeapTable WITH(TABLOCK)
SELECT * FROM SomeTable


To parallel load using BCP, BULK INSERT, run multiple commands with multiple data files.

/* 3) Minimal logged and Parallel (run in different session) */
BULK INSERT HeapTable
FROM 'C:\path\to\file1.txt'
WITH (TABLOCK);

BULK INSERT HeapTable
FROM 'C:\path\to\file2.txt'
WITH (TABLOCK);


By default, only loading of empty heaps can be minimally logged using TABLOCK hint but Trace Flag 610 Allows minimally logged inserts into indexed tables for newly allocated pages. This trace flag was originally designed to save log writes not to improve INSERT performance but in some cases it does improves performance.
(Note: Non Clustered Index changes may not be minimally logged.)

/* 1) Naive (Fully Logged) */
BULK INSERT ClusterTable
FROM 'C:\path\to\file.txt';

/* 2) Enable TF 610 (Minimal logged) */
DBCC TRACEON (610, -1);

BULK INSERT ClusterTable
FROM 'C:\path\to\file.txt'


Use ORDER hint to eliminate sorting when inserting sorted input data into a clustered table.

/* 3) Now get rid of sorting completely (Still Minimal logged) */
BULK INSERT ClusterTable
FROM 'C:\path\to\file.txt'
WITH (ORDER (SortColumn ASC));


Do NOT use TABLOCK hint as Indexed tables do not support BU locks so the solution to load parallel is Use non overlapping input streams.

/* 4) Minimal logged and Parallel (run in different session) */
BULK INSERT ClusterTable
FROM 'C:\path\to\file1.txt'
WITH (ORDER (SortColumn ASC));

BULK INSERT ClusterTable
FROM 'C:\path\to\file2.txt'
WITH (ORDER (SortColumn ASC));


Above method should work but one thread gets stuck until other completes because of lock escalation to table level lock. So we are not able to get parallel load, to mitigate this use batching or partitioning as described below.

BATCHSIZE
It describes how many rows are committed at a time during the bulk operation. If BATCHSIZE value is left at its default, the entire bulk operation is committed as one, big transaction. However, if this value is greater than 0, a new transaction is created and committed every time the amount of rows specified has been sent to the server, so we can safely say Each batch is a separate transaction.

• Prevent TempDB spill: caused due to SORT when loading into an indexed table.
• Prevent Lock Escalation: caused when more than 5000 rows get inserted.
• Prevent Log File Growth: caused when entire transaction is committed as one big transaction.
/* 5) Minimal logged and Parallel and Batched (run in different session) */
BULK INSERT ClusterTable
FROM 'C:\path\to\file1.txt'
WITH (BATCHSIZE = 2000,ORDER (SortColumn ASC));

BULK INSERT ClusterTable
FROM 'C:\path\to\file2.txt'
WITH (BATCHSIZE = 2000,ORDER (SortColumn ASC));


PARTITIONING

Partitioning the clustered table also helps to perform parallel load data into table, there are two approaches to this as discussed below:

• Use multiple loads into different partitions in parallel.
• SET  LOCK_ESCALATION = AUTO  for partitioned table to enable partition level lock escalation instead of table level escalation.

Loading into temporary staging tables(HEAP) and then switch in data into PARTITIONED CLUSTERED table.

• Create empty staging tables.
• Apply bulk load optimization on each staging table and create SSIS task or BULK INSERT OR BCP Utility for loading each of the these staging table.
• Recreate clustered indexes in parallel again using any of the tasks suggested above.
• Switch back in to main table.
• Clean up staging tables.

Some points about SSIS Batch size: Integration Services does not handle batch sizes in the same way that the bulk load method does, Integration Services will by default create one batch per pipeline buffer. After the buffer is flushed, the batch is committed. It is possible to override this behavior by changing the value of the Maximum Insert Commit Size in the Data Destination.Below table summarizes the behavior of Maximum Insert Commit Size (MICS).

• MICS  > buffer size (Setting is ignored. One commit is issued for every buffer.)
• MICS = 0 (The entire batch is committed in one big batch. Behaves just like BATCHSIZE = 0.)
• MICS  < buffer size (Commit is issued every time MICS rows are sent.Commits are also issued at the end of each buffer.)