I have two SQL queries that provides the same output. My first intuition was to use this:
SELECT * FROM performance_dev.report_golden_results where id IN (SELECT max(id) as 'id' from performance_dev.report_golden_results group by platform_id, release_id, configuration_id)
Now, this took something like 70 secs to complete!
Searching for another solution I tried something similar:
SELECT * FROM performance_dev.report_golden_results e join (SELECT max(id) as 'id' from performance_dev.report_golden_results group by platform_id, release_id, configuration_id) s ON s.id = e.id;
Surprisingly, this took 0.05 secs to complete!!!
how come these two are so different?
thanks!
Advertisement
Answer
First thing which Might Cause the Time Lag is that MySQL uses ‘semi-join’ strategy for Subqueries.The Semi Join includes Following Steps :
If a subquery meets the preceding criteria, MySQL converts it to a semi-join and makes a cost-based choice from these strategies:
Convert the subquery to a join, or use table pullout and run the query as an inner join between subquery tables and outer tables. Table pullout pulls a table out from the subquery to the outer query.
Duplicate Weedout: Run the semi-join as if it was a join and remove duplicate records using a temporary table.
FirstMatch: When scanning the inner tables for row combinations and there are multiple instances of a given value group, choose one rather than returning them all. This “shortcuts” scanning and eliminates production of unnecessary rows.
LooseScan: Scan a subquery table using an index that enables a single value to be chosen from each subquery’s value group.
Materialize the subquery into a temporary table with an index and use the temporary table to perform a join. The index is used to remove duplicates. The index might also be used later for lookups when joining the temporary table with the outer tables; if not, the table is scanned.
But giving an explicit join reduces these efforts which might be the Reason.
I hope it helped!