SQL Server – Sharing Experiences

If you are inserting a lot of rows in a big table one by one then it will take long time to do so given that the table has a number of indexes.

Try to rewrite the code so that you can do bulk insert. That will be faster.

For example, if you are inserting 2, 000 rows in a heavily indexed table, it might take 20 mins. If you can change the logic and get it done in a single bulk insert you can do it in say, 5 seconds.

If you cannot do so, then create a temp table equivalent to the one where the data should have been inserted. Once all the data are inserted into the temp table one by one, at the end at one go you can move them to the original table. You might need to use transaction and put in some more logic depending on the situations.

Another thing that you can do along with the above or alone is temporary disable the (non-clustered) indexes.

When you insert rows in a table, it is not only the data that MS SQL Server writes to disk but also the indexes on the table. The latter can take up a lot of time. So disable the indexes before you start inserting any data and rebuild it at the end of all insertions. That way you create the indexes at one go.

ALTER INDEX [Index_Name] ON Table DISABLE

ALTER INDEX [Index_Name] ON Table REBUILD

Index

Let’s have a look at the following query:

SELECT TransactionNumber
FROM TransactionTable
WHERE TransactionNumber NOT IN
(SELECT TransactionNumber
FROM ClosedTransactionTable)

The following query can be re-written by using JOIN as the following way:

SELECT TransactionNumber
FROM TransactionTable
LEFT JOIN 
(SELECT DISTINCT TransactionNumber AS DistinctTransactionNumber 
FROM ClosedTransactionTable) AS T2
ON TransactionNumber = T2.DistinctTransactionNumber
WHERE T2.DistinctTransactionNumber IS NULL

Writing subqueries makes it easier to understand and at times works better than joins. However, at times a query using JOIN can execute in a few seconds instead few minutes for the same using IN.

Suppose the outer query is a JOIN of several tables including some large ones that produces several thousand or more rows. And the subquery would produce a maximum of several thousand rows every time you execute it. You know this because you have the domain knowledge of the business. In that case, if you can produce all the rows of the subquery by executing just once and create a HASH TABLE in the memory. Then using a HASH JOIN on it becomes quite fast. You can achieve this by giving the query hint to use HASH JOIN. The above query would then look like the below:

SELECT TransactionNumber
FROM TransactionTable
LEFT HASH JOIN 
(SELECT DISTINCT TransactionNumber AS DistinctTransactionNumber 
FROM ClosedTransactionTable) AS T2
ON TransactionNumber = T2.DistinctTransactionNumber
WHERE T2.DistinctTransactionNumber IS NULL

This query would execute really fast. It is true in general for all kinds of subqueries in MS SQL Server.

Index

Tag: SQL Server

Optimizing Inserts

Replacing Subqueries With Join Might Drastically Boost Query Performance