Replacing Subqueries With Join Might Drastically Boost Query Performance

Let’s have a look at the following query:

SELECT TransactionNumber
FROM TransactionTable
WHERE TransactionNumber NOT IN
(SELECT TransactionNumber
FROM ClosedTransactionTable)

The following query can be re-written by using JOIN as the following way:

SELECT TransactionNumber
FROM TransactionTable
LEFT JOIN 
(SELECT DISTINCT TransactionNumber AS DistinctTransactionNumber 
FROM ClosedTransactionTable) AS T2
ON TransactionNumber = T2.DistinctTransactionNumber
WHERE T2.DistinctTransactionNumber IS NULL

Writing subqueries makes it easier to understand and at times works better than joins. However, at times a query using JOIN can execute in a few seconds instead few minutes for the same using IN.

Suppose the outer query is a JOIN of several tables including some large ones that produces several thousand or more rows. And the subquery would produce a maximum of several thousand rows every time you execute it. You know this because you have the domain knowledge of the business. In that case, if you can produce all the rows of the subquery by executing just once and create a HASH TABLE in the memory. Then using a HASH JOIN on it becomes quite fast. You can achieve this by giving the query hint to use HASH JOIN. The above query would then look like the below:

SELECT TransactionNumber
FROM TransactionTable
LEFT HASH JOIN 
(SELECT DISTINCT TransactionNumber AS DistinctTransactionNumber 
FROM ClosedTransactionTable) AS T2
ON TransactionNumber = T2.DistinctTransactionNumber
WHERE T2.DistinctTransactionNumber IS NULL

This query would execute really fast.  It is true in general for all kinds of subqueries in MS SQL Server.

Index

Author: Gopal Das

Data Scientist @ CrimsonLogic, Singapore BS in CSE from Khulna University ME in Internet Science & Engineering from Indian Institute of Science (IISc) Publications on Query Optimization in RDBMS in ACM SIGMOD, IEEE ICDE etc. Founding team member and VP Engineering of iTwin, a spinoff from A*STAR Software engineer/data scientist for 19 years Software, Database, ML Father of 3 (two @ NUS High and one is too little!) www.linkedin.com/in/dasgopal https://github.com/gopalcdas

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: