I have a sales table with a sales column TCV, customerKey column which holds customerKey stored in customer table. Each row has an order_date and there are other columns irrelevant to this query.
I have to find sales for current period and another period grouped by customers for comparison. So I have this below query.
SELECT *, SUM(TCV) AS sales, current_sales - SUM(TCV) AS difference FROM `sales` LEFT JOIN (SELECT customerKey AS customerID, SUM(TCV) AS current_sales FROM `sales` WHERE `order_date` BETWEEN '2020-08-01' AND '2020-08-31' GROUP BY `sales`.`customerKey`) AS `current_sales` ON `customerKey` = `customerID` WHERE `order_date` BETWEEN '2020-09-01' AND '2020-09-31' GROUP BY `sales`.`customerKey`
I have this query and it runs very slowly, takes about 30 secs, but if I run the query without the join the result comes back in a second.
What could be the problem, is it structured wrong?
Advertisement
Answer
Rewrite it without join, it will perform better:
select s.* , current_sales - sales as difference from ( SELECT customerKey , sum(CASE WHEN order_date BETWEEN '2020-08-01' AND '2020-08-31' then TCV else 0 end) current_sales , sum(CASE WHEN order_date BETWEEN '2020-09-01' AND '2020-09-30' then TCV else 0 end) sales FROM sales WHERE order_date BETWEEN '2020-08-01' AND '2020-09-30' GROUP BY sales.customerKey ) s