I have a sales table with a sales column TCV, customerKey column which holds customerKey stored in customer table. Each row has an order_date and there are other columns irrelevant to this query.
I have to find sales for current period and another period grouped by customers for comparison. So I have this below query.
x
SELECT
*,
SUM(TCV) AS sales,
current_sales - SUM(TCV) AS difference
FROM
`sales`
LEFT JOIN
(SELECT
customerKey AS customerID,
SUM(TCV) AS current_sales
FROM
`sales`
WHERE `order_date` BETWEEN '2020-08-01'
AND '2020-08-31'
GROUP BY `sales`.`customerKey`) AS `current_sales`
ON `customerKey` = `customerID`
WHERE `order_date` BETWEEN '2020-09-01'
AND '2020-09-31'
GROUP BY `sales`.`customerKey`
I have this query and it runs very slowly, takes about 30 secs, but if I run the query without the join the result comes back in a second.
What could be the problem, is it structured wrong?
Advertisement
Answer
Rewrite it without join, it will perform better:
select s.*
, current_sales - sales as difference
from
( SELECT customerKey
, sum(CASE WHEN order_date BETWEEN '2020-08-01' AND '2020-08-31' then TCV else 0 end) current_sales
, sum(CASE WHEN order_date BETWEEN '2020-09-01' AND '2020-09-30' then TCV else 0 end) sales
FROM sales
WHERE order_date BETWEEN '2020-08-01' AND '2020-09-30'
GROUP
BY sales.customerKey
) s