While I’m trying to get all accounts, which received message from foo@gmail.com
. I want to know about the performance of two following query.
SELECT * FROM account WHERE account_id in ( SELECT distinct account_id FROM message mes WHERE mes.sender = 'foo@gmail.com' )
or
SELECT distinct account.* FROM account acc INNER JOIN message mes ON mes.account_id = acc._id WHERE mes.sender = 'foo@gmail.com'
Thanks!
Advertisement
Answer
Always use the first query in such situations. Don’t use DISTINCT
inside an IN
clause subquery, though. Don’t try to tell the DBMS how to perform the IN
. (Well, a good DBMS should simply ignore the DISTINCT
and make up its own mind how to look up the rows.)
SELECT * FROM account WHERE account_id IN (SELECT account_id FROM message mes WHERE mes.sender = 'foo@gmail.com');
Ideally you’d have an index on message (sender, account_id)
. The message table itself wouldn’t even have to be read. The sender is looked up in the index, then all matching account IDs taken. With that you read the accounts and that’s it. Without an index this is probably much slower, but still: Read the message table once, pick up the distinct account IDs, then read the accounts. No big deal.
With a join instead you’d combine all messages with their accounts. This can be quite a big intermediate result, which must be sorted to get distinct rows. A costly operation. And even if the DBMS is great with joins and sorting and does this extremely fast, then it can still use this same approach for the simple query with the IN
clause. It’s up to the DBMS to make a good plan and a perfect DBMS would come up with exactly the same plan for the two queries 🙂
My advice: Only join when you are interested in the joined result. In your case you are not. You are interested in accounts matching some criteria, so write the query accordingly. Don’t mutilate your queries, because of thinking the DBMS will use another slick approach then. It may even do this in the current DBMS version and stop doing this with the next update. Keep your queries as readable as possible. If you want accounts where a certain messages exists, use WHERE
with EXISTS
or IN
. This is how SQL is supposed to be written.