Let’s have a look at the following query:
SELECT TransactionNumber FROM TransactionTable WHERE TransactionNumber NOT IN (SELECT TransactionNumber FROM ClosedTransactionTable)
The following query can be re-written by using JOIN as the following way:
SELECT TransactionNumber FROM TransactionTable LEFT JOIN (SELECT DISTINCT TransactionNumber AS DistinctTransactionNumber FROM ClosedTransactionTable) AS T2 ON TransactionNumber = T2.DistinctTransactionNumber WHERE T2.DistinctTransactionNumber IS NULL
Writing subqueries makes it easier to understand and at times works better than joins. However, at times a query using JOIN can execute in a few seconds instead few minutes for the same using IN.
Suppose the outer query is a JOIN of several tables including some large ones that produces several thousand or more rows. And the subquery would produce a maximum of several thousand rows every time you execute it. You know this because you have the domain knowledge of the business. In that case, if you can produce all the rows of the subquery by executing just once and create a HASH TABLE in the memory. Then using a HASH JOIN on it becomes quite fast. You can achieve this by giving the query hint to use HASH JOIN. The above query would then look like the below:
SELECT TransactionNumber FROM TransactionTable LEFT HASH JOIN (SELECT DISTINCT TransactionNumber AS DistinctTransactionNumber FROM ClosedTransactionTable) AS T2 ON TransactionNumber = T2.DistinctTransactionNumber WHERE T2.DistinctTransactionNumber IS NULL
This query would execute really fast. It is true in general for all kinds of subqueries in MS SQL Server.