Skip to content
Advertisement

I wonder using INNER JOIN and equality operator is faster or using IN when I try to filter data from a table by another table’s column

While I’m trying to get all accounts, which received message from foo@gmail.com. I want to know about the performance of two following query.

SELECT *
FROM account
WHERE
    account_id in (
        SELECT distinct account_id
        FROM message mes
        WHERE mes.sender = 'foo@gmail.com'
    )

or

SELECT distinct account.*
FROM account acc
    INNER JOIN message mes
        ON mes.account_id = acc._id
WHERE
    mes.sender = 'foo@gmail.com'

Thanks!

Advertisement

Answer

Always use the first query in such situations. Don’t use DISTINCT inside an IN clause subquery, though. Don’t try to tell the DBMS how to perform the IN. (Well, a good DBMS should simply ignore the DISTINCT and make up its own mind how to look up the rows.)

SELECT *
FROM account
WHERE account_id IN (SELECT account_id FROM message mes WHERE mes.sender = 'foo@gmail.com');

Ideally you’d have an index on message (sender, account_id). The message table itself wouldn’t even have to be read. The sender is looked up in the index, then all matching account IDs taken. With that you read the accounts and that’s it. Without an index this is probably much slower, but still: Read the message table once, pick up the distinct account IDs, then read the accounts. No big deal.

With a join instead you’d combine all messages with their accounts. This can be quite a big intermediate result, which must be sorted to get distinct rows. A costly operation. And even if the DBMS is great with joins and sorting and does this extremely fast, then it can still use this same approach for the simple query with the IN clause. It’s up to the DBMS to make a good plan and a perfect DBMS would come up with exactly the same plan for the two queries 🙂

My advice: Only join when you are interested in the joined result. In your case you are not. You are interested in accounts matching some criteria, so write the query accordingly. Don’t mutilate your queries, because of thinking the DBMS will use another slick approach then. It may even do this in the current DBMS version and stop doing this with the next update. Keep your queries as readable as possible. If you want accounts where a certain messages exists, use WHERE with EXISTS or IN. This is how SQL is supposed to be written.

User contributions licensed under: CC BY-SA
2 People found this is helpful
Advertisement