Wednesday, November 30, 2011

Part 2: SqlBulkCopy Class (MS SqlServer and .NET)

This blog post only exists to prevent expired linking. My blog has moved to: http://pabloaizpiri.com/

--------------------------------------------------------------------------------

At work we're making an effort to contribute technical knowledge to a centralized IT wiki. I like that. Writing encouraged a good understanding of the technology and it's benefits. I didn't think I'd have much to write but I'm surprised how some things I take for granted as simple, others didn't know and vice versa. This two part series is from those entries.



​Introduction
Normally, inserting rows into SQL server is quick and easy and done through a simple INSERT SQL statement. This is fine when saving data to one, two, or even a few rows. However, when it is necessary to insert larger sets of data, this method becomes not only functionally inadequate, but slow and clunky. In this entry (part two of a two-part series) I wanted to write about the second option we will look at for inserting large sets of data: using .NET’s SQLBulkCopy class.
The SqlBulkCopy ClassWhen it is necessary to insert more than about a 1000 rows of data, a TVP​ now begins to reach the limits of its performance gain. If we are using the TVP’s only for inserts, we can move up and dramatically increase performance by using .NET’s SqlBulkCopy class. In addition to providing the functionally for large inserts, the SQLBulkCopy class can also be used to copy large amounts of data between tables. With SQLBulkCopy we can deal with millions of rows if need be. Here is an amazing whitepaper on the SqlBulkCopy class’ performance:http://www.sqlbi.com/LinkClick.aspx?fileticket=svahq1Mpp9A%3d&tabid=169&mid=375
The SqlBulkCopy Class
Using the SQLBulkCopy class if fairly simple. The dataset you use must match the columns on the table. If the order or column names are a bit different, that’s okay since that can be handled with the SqlBulkCopy class’ ColumnMapping property which is just for that. Here’s a .NET sample of using the SqlBulkCopy class to update a table named “tblSIWellList” from a table name “MyData” within a DataSet:
Using objConnection As System.Data.SqlClient.SqlConnection = GetSQLConnection()
    objConnection.Open()
           
    Using bulkCopy As SqlBulkCopy = New SqlBulkCopy(objConnection)
        bulkCopy.DestinationTableName = "dbo.tblSIWellList"
        With bulkCopy.ColumnMappings
            .Add(New SqlBulkCopyColumnMapping("ID", "ID"))
            .Add(New SqlBulkCopyColumnMapping("EPD", "EPDate"))
            .Add(New SqlBulkCopyColumnMapping("Comments", "Comments"))
            .Add(New SqlBulkCopyColumnMapping("Date", "Date"))
            .Add(New SqlBulkCopyColumnMapping("RTP_Date", "RTPDate"))
        End With
        bulkCopy.WriteToServer(ds.Tables("MyData"))
    End Using
End Using

Because SqlBulkCopy class is designed to copy/insert a large number of rows, transactions are handled in batches. It is possible to specify how large each batch is (e.g. 5000 rows at a time) but by default a single transaction (“batch”) is used for all rows. When committing transaction in batches, a failed batch will only roll back the last active transaction in the batch. (This may not necessarily be all rows if a previous batch was successfully committed)

Considerations when using .NET’s SlqBulkCopy class
There are a few “gotcha”’s to keep in mind when using the SqlBulkCopy class:
  • When the source and destination table data types are different, SqlBulkCopy will attempt to convert to the destination data type where possible but this will incur a performance hit.
  • By default, PK’s are assigned by destination and are not preserved.
  • By default, constraints are not checked and triggers are not fired. Also row-level locks are used. Changing these setting may affect performance.​

No comments: