Skip to content
Advertisement

Why ‘ in ‘ is so much faster than ‘ = ‘ in SQL Select?

This select’s execution time is approximately 25 ~ 30 sec.

SELECT *
  FROM custinfo cs
 WHERE cs.idcust = (SELECT cust_id
                        FROM customers
                       WHERE id = 1230)

Execution plan for ‘ = ‘:

Execution plan for ' = '

But if I change ‘ = ‘ for ‘ in ‘, then it becomes so much faster about 0.040 ~ 0.060-sec average.

SELECT *
      FROM custinfo cs
     WHERE cs.idcust in (SELECT cust_id
                            FROM customers
                           WHERE id = 1230)

Execution plan for ‘ in ‘:

Execution plan for ' in '

And there have been opposite cases like this, where ‘ = ‘ was faster than ‘ in ‘.

Does anybody know the reason why simple syntax change makes this much difference in performance and execution time?

When is ‘ = ‘ is faster than ‘ in ‘ or vice versa? Are there some conditions for which to use in what cases?

  • I’m using dblink for my table. Maybe that’s what’s affecting my query?
  • Welp, guys, Here’s the thing… Now both of my queries run for about ~0.10 sec. So now I can’t find an Execution plan for my queries when they were running slow. And I have absolutely no idea why my queries performance changed in a day… Like, I can only guess the problem was with our servers, but, why did it only affect my 1 query, while the other runs normally? Still, here’s my execution plans:

‘ = ‘

' = ' Execution Plan

‘ in ‘ ' in ' Execution Plan

Advertisement

Answer

The “IN” means “there might be more that one row returned in this subquery, please check them all” whereas “=” means “there will be only one line returned from subquery” otherwise it would be an error.

Having that info the optimizer build different query plans. For “=”-query it executes subquery first and then filters the custInfo table out.

For the “IN” query optimizer performs a join operation as if you’ve written following query

SELECT *
  FROM custinfo cs
  JOIN customers c
    ON cs.idcust = c.cust_id
 WHERE c.id = 1230;

This is why execution time differs. It can take longer or not depending on you data selectivity, indexes, partitioning and so on

UPD. From the execution plans you’ve uploaded I see the following

  1. For the “=” query:
1.1. It competely scans the MT_OPERATION_OUT table (FULL TABLE SCAN), captures the result
1.2. Then it accesess another table on remote DB, presumably scans it too (REMOTE)
1.3. Filters data it got from remote.
  1. For the “IN” query:
2.1. It competely scans the MT_OPERATION_OUT table (FULL TABLE SCAN), captures the result
2.2. Sorts what it got on the previous step (SORT UNIQUE)
2.3. Then it accesess another table on remote DB, presumably scans it too (REMOTE)
2.4. Performs a join (NESTED LOOPS)

So to me it seems that for some reason the db needs more time to filter data from remote db that to join it using “nested loops” method.

User contributions licensed under: CC BY-SA
8 People found this is helpful
Advertisement