I have a select query, that selects over 50k records from MySQL 5.5 database at once, and this amount is expected to grow. The query contains multiple subquery which is taking over 120s to execute.
Initially some of the sale_items
and stock
tables didn’t have more that the ID keys, so I added some more:
SELECT `p`.`id` AS `id`, `p`.`Name` AS `Name`, `p`.`Created` AS `Created`, `p`.`Image` AS `Image`, `s`.`company` AS `supplier`, `s`.`ID` AS `supplier_id`, `c`.`name` AS `category`, IFNULL((SELECT SUM(`stocks`.`Total_Quantity`) FROM `stocks` WHERE (`stocks`.`Product_ID` = `p`.`id`)), 0) AS `total_qty`, IFNULL((SELECT SUM(`sale_items`.`quantity`) FROM `sale_items` WHERE (`sale_items`.`product_id` = `p`.`id`)), 0) AS `total_sold`, IFNULL((SELECT SUM(`sale_items`.`quantity`) FROM `sale_items` WHERE ((`sale_items`.`product_id` = `p`.`id`) AND `sale_items`.`Sale_ID` IN (SELECT `refunds`.`Sale_ID` FROM `refunds`))), 0) AS `total_refund` FROM ((`products` `p` LEFT JOIN `cats` `c` ON ((`c`.`ID` = `p`.`cat_id`))) LEFT JOIN `suppliers` `s` ON ((`s`.`ID` = `p`.`supplier_id`)))
This is the explain result
+----+--------------------+------------+----------------+------------------------+------------------------+---------+--------------------------------- | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+--------------------+------------+----------------+------------------------+------------------------+---------+--------------------------------- | 1 | PRIMARY | <derived2> | ALL | NULL | NULL | NULL | NULL | 20981 | | | 2 | DERIVED | p | ALL | NULL | NULL | NULL | NULL | 20934 | | | 2 | DERIVED | c | eq_ref | PRIMARY | PRIMARY | 4 | p.cat_id | 1 | | | 2 | DERIVED | s | eq_ref | PRIMARY | PRIMARY | 4 | p.supplier_id | 1 | | | 5 | DEPENDENT SUBQUERY | sale_items | ref | sales_items_product_id | sales_items_product_id | 5 | p.id | 33 | Using where | | 6 | DEPENDENT SUBQUERY | refunds | index_subquery | IDX_refunds_sale_id | IDX_refunds_sale_id | 5 | func | 1 | Using index; Using where | | 4 | DEPENDENT SUBQUERY | sale_items | ref | sales_items_product_id | sales_items_product_id | 5 | p.id | 33 | Using where | | 3 | DEPENDENT SUBQUERY | stocks | ref | IDX_stocks_product_id | IDX_stocks_product_id | 5 | p.id | 1 | Using where | +----+--------------------+------------+----------------+------------------------+------------------------+---------+---------------------------------
I am expecting that the query takes less that 3s at most, but I can’t seem to figure out the best way to optimize this query.
Advertisement
Answer
The query looks fine to me. You select all data and aggregate some of it. This takes time. Your explain plan shows there are indexes on the IDs, which is good. And at a first glance there is not much we seem to be able to do here…
What you can do, though, is provide covering indexes, i.e. indexes that contain all columns you need from a table, so the data can be taken from the index directly.
create index idx1 on cats(id, name); create index idx2 on suppliers(id, company); create index idx3 on stocks(product_id, total_quantity); create index idx4 on sale_items(product_id, quantity, sale_id);
This can really boost your query.
What you can try About the query itself is to move the subqueries to the FROM
clause. MySQL’s optimizer is not great, so although it should get the same execution plan, it may well be that it favors the FROM
clause.
SELECT p.id, p.name, p.created, p.image, s.company as supplier, s.id AS supplier_id, c.name AS category, COALESCE(st.total, 0) AS total_qty, COALESCE(si.total, 0) AS total_sold, COALESCE(si.refund, 0) AS total_refund FROM products p LEFT JOIN cats c ON c.id = p.cat_id LEFT JOIN suppliers s ON s.id = p.supplier_id LEFT JOIN ( SELECT SUM(total_quantity) AS total FROM stocks GROUP BY product_id ) st ON st.product_id = p.id LEFT JOIN ( SELECT SUM(quantity) AS total, SUM(CASE WHEN sale_id IN (SELECT sale_id FROM refunds) THEN quantity END) as refund FROM sale_items GROUP BY product_id ) si ON si.product_id = p.id;
(If sale_id
is unique in refunds
, then you can even join it to sale_items
. Again: this should usually not make a difference, but in MySQL it may still. MySQL was once notorious for treating IN
clauses much worse than the FROM
clause. This may not be the case anymore, I don’t know. You can try – if refunds.sale_id
is unique).