How to optimize MySQL select query or make it fast…

I have a select query, that selects over 50k records from MySQL 5.5 database at once, and this amount is expected to grow. The query contains multiple subquery which is taking over 120s to execute.

Initially some of the sale_items and stock tables didn’t have more that the ID keys, so I added some more:

SELECT
  `p`.`id` AS `id`,
  `p`.`Name` AS `Name`,
  `p`.`Created` AS `Created`,
  `p`.`Image` AS `Image`,
  `s`.`company` AS `supplier`,
  `s`.`ID` AS `supplier_id`,
  `c`.`name` AS `category`,
  IFNULL((SELECT
    SUM(`stocks`.`Total_Quantity`)
  FROM `stocks`
  WHERE (`stocks`.`Product_ID` = `p`.`id`)), 0) AS `total_qty`,
  IFNULL((SELECT
    SUM(`sale_items`.`quantity`)
  FROM `sale_items`
  WHERE (`sale_items`.`product_id` = `p`.`id`)), 0) AS `total_sold`,
  IFNULL((SELECT
    SUM(`sale_items`.`quantity`)
  FROM `sale_items`
  WHERE ((`sale_items`.`product_id` = `p`.`id`) AND `sale_items`.`Sale_ID` IN (SELECT
    `refunds`.`Sale_ID`
  FROM `refunds`))), 0) AS `total_refund`
  FROM ((`products` `p`
    LEFT JOIN `cats` `c`
      ON ((`c`.`ID` = `p`.`cat_id`)))
    LEFT JOIN `suppliers` `s`
      ON ((`s`.`ID` = `p`.`supplier_id`)))

​x
 
SELECT  `p`.`id` AS `id`,  `p`.`Name` AS `Name`,  `p`.`Created` AS `Created`,  `p`.`Image` AS `Image`,  `s`.`company` AS `supplier`,  `s`.`ID` AS `supplier_id`,  `c`.`name` AS `category`,  IFNULL((SELECT    SUM(`stocks`.`Total_Quantity`)  FROM `stocks`  WHERE (`stocks`.`Product_ID` = `p`.`id`)), 0) AS `total_qty`,  IFNULL((SELECT    SUM(`sale_items`.`quantity`)  FROM `sale_items`  WHERE (`sale_items`.`product_id` = `p`.`id`)), 0) AS `total_sold`,  IFNULL((SELECT    SUM(`sale_items`.`quantity`)  FROM `sale_items`  WHERE ((`sale_items`.`product_id` = `p`.`id`) AND `sale_items`.`Sale_ID` IN (SELECT    `refunds`.`Sale_ID`  FROM `refunds`))), 0) AS `total_refund`  FROM ((`products` `p`    LEFT JOIN `cats` `c`      ON ((`c`.`ID` = `p`.`cat_id`)))    LEFT JOIN `suppliers` `s`      ON ((`s`.`ID` = `p`.`supplier_id`)))​

This is the explain result

+----+--------------------+------------+----------------+------------------------+------------------------+---------+---------------------------------
| id | select_type        | table      | type           | possible_keys          | key                    | key_len | ref     | rows  | Extra                    |
+----+--------------------+------------+----------------+------------------------+------------------------+---------+---------------------------------
|  1 | PRIMARY            | <derived2> | ALL            | NULL                   | NULL                   | NULL    | NULL      | 20981 |                          |
|  2 | DERIVED            | p          | ALL            | NULL                   | NULL                   | NULL    | NULL      | 20934 |                          |
|  2 | DERIVED            | c          | eq_ref         | PRIMARY                | PRIMARY                | 4       | p.cat_id      |     1 |                          |
|  2 | DERIVED            | s          | eq_ref         | PRIMARY                | PRIMARY                | 4       | p.supplier_id |     1 |                          |
|  5 | DEPENDENT SUBQUERY | sale_items  | ref            | sales_items_product_id | sales_items_product_id | 5       | p.id   |    33 | Using where              |
|  6 | DEPENDENT SUBQUERY | refunds    | index_subquery | IDX_refunds_sale_id    | IDX_refunds_sale_id    | 5       | func    |     1 | Using index; Using where |
|  4 | DEPENDENT SUBQUERY | sale_items  | ref            | sales_items_product_id | sales_items_product_id | 5       | p.id   |    33 | Using where              |
|  3 | DEPENDENT SUBQUERY | stocks     | ref            | IDX_stocks_product_id  | IDX_stocks_product_id  | 5       | p.id    |     1 | Using where              |
+----+--------------------+------------+----------------+------------------------+------------------------+---------+---------------------------------

 
+----+--------------------+------------+----------------+------------------------+------------------------+---------+---------------------------------| id | select_type        | table      | type           | possible_keys          | key                    | key_len | ref     | rows  | Extra                    |+----+--------------------+------------+----------------+------------------------+------------------------+---------+---------------------------------|  1 | PRIMARY            | <derived2> | ALL            | NULL                   | NULL                   | NULL    | NULL      | 20981 |                          ||  2 | DERIVED            | p          | ALL            | NULL                   | NULL                   | NULL    | NULL      | 20934 |                          ||  2 | DERIVED            | c          | eq_ref         | PRIMARY                | PRIMARY                | 4       | p.cat_id      |     1 |                          ||  2 | DERIVED            | s          | eq_ref         | PRIMARY                | PRIMARY                | 4       | p.supplier_id |     1 |                          ||  5 | DEPENDENT SUBQUERY | sale_items  | ref            | sales_items_product_id | sales_items_product_id | 5       | p.id   |    33 | Using where              ||  6 | DEPENDENT SUBQUERY | refunds    | index_subquery | IDX_refunds_sale_id    | IDX_refunds_sale_id    | 5       | func    |     1 | Using index; Using where ||  4 | DEPENDENT SUBQUERY | sale_items  | ref            | sales_items_product_id | sales_items_product_id | 5       | p.id   |    33 | Using where              ||  3 | DEPENDENT SUBQUERY | stocks     | ref            | IDX_stocks_product_id  | IDX_stocks_product_id  | 5       | p.id    |     1 | Using where              |+----+--------------------+------------+----------------+------------------------+------------------------+---------+---------------------------------​

I am expecting that the query takes less that 3s at most, but I can’t seem to figure out the best way to optimize this query.

Answer

The query looks fine to me. You select all data and aggregate some of it. This takes time. Your explain plan shows there are indexes on the IDs, which is good. And at a first glance there is not much we seem to be able to do here…

What you can do, though, is provide covering indexes, i.e. indexes that contain all columns you need from a table, so the data can be taken from the index directly.

create index idx1 on cats(id, name);
create index idx2 on suppliers(id, company);
create index idx3 on stocks(product_id, total_quantity);
create index idx4 on sale_items(product_id, quantity, sale_id);

 
create index idx1 on cats(id, name);create index idx2 on suppliers(id, company);create index idx3 on stocks(product_id, total_quantity);create index idx4 on sale_items(product_id, quantity, sale_id);​

This can really boost your query.

What you can try About the query itself is to move the subqueries to the FROM clause. MySQL’s optimizer is not great, so although it should get the same execution plan, it may well be that it favors the FROM clause.

SELECT
  p.id,
  p.name,
  p.created,
  p.image,
  s.company as supplier,
  s.id AS supplier_id,
  c.name AS category,
  COALESCE(st.total, 0) AS total_qty,
  COALESCE(si.total, 0) AS total_sold,
  COALESCE(si.refund, 0) AS total_refund
FROM products p
LEFT JOIN cats c ON c.id = p.cat_id
LEFT JOIN suppliers s ON s.id = p.supplier_id
LEFT JOIN
(
  SELECT SUM(total_quantity) AS total
  FROM stocks 
  GROUP BY product_id
) st ON st.product_id = p.id
LEFT JOIN
(
  SELECT
    SUM(quantity) AS total,
    SUM(CASE WHEN sale_id IN (SELECT sale_id FROM refunds) THEN quantity END) as refund
  FROM sale_items
  GROUP BY product_id
) si ON si.product_id = p.id;

 
SELECT  p.id,  p.name,  p.created,  p.image,  s.company as supplier,  s.id AS supplier_id,  c.name AS category,  COALESCE(st.total, 0) AS total_qty,  COALESCE(si.total, 0) AS total_sold,  COALESCE(si.refund, 0) AS total_refundFROM products pLEFT JOIN cats c ON c.id = p.cat_idLEFT JOIN suppliers s ON s.id = p.supplier_idLEFT JOIN(  SELECT SUM(total_quantity) AS total  FROM stocks   GROUP BY product_id) st ON st.product_id = p.idLEFT JOIN(  SELECT    SUM(quantity) AS total,    SUM(CASE WHEN sale_id IN (SELECT sale_id FROM refunds) THEN quantity END) as refund  FROM sale_items  GROUP BY product_id) si ON si.product_id = p.id;​

(If sale_id is unique in refunds, then you can even join it to sale_items. Again: this should usually not make a difference, but in MySQL it may still. MySQL was once notorious for treating IN clauses much worse than the FROM clause. This may not be the case anymore, I don’t know. You can try – if refunds.sale_id is unique).

How to optimize MySQL select query or make it faster

Advertisement

Answer