I need to find the first and last Order for each Customer by OrderDate, and the name and SKU of the item with the highest business volume in each of those orders. For reference, the Customer table has >150k records, and Orders and OrderDetails (these are the Items) a lot more.
Note: Both Orders and their respective items should be in the same row as the Customer
Orders
OrderID OrderDate CustomerID BusinessVolumeTotal Subtotal 13212 '2021-09-06' 512312 500.00 25.60
OrderDetails
OrderID ItemCode ItemDescription BusinessVolume 13212 'SKW-BS' 'Some item' 450.00
In my first query I attempted to stick to joining instead of subqueries, which resulted in this
select distinct(c.CustomerID), c.FirstName + ' ' + c.LastName as Name, cs.CustomerStatusDescription as Status, ct.CustomerTypeDescription as Type, pv.Volume80 as G3, fo.OrderID,fo.OrderDate,fo.SubTotal,fo.Country, fod.ItemCode, fod.ItemDescription, fopt.PriceTypeID, lo.OrderID,lo.OrderDate,lo.SubTotal,lo.Country, lod.ItemCode, lod.ItemDescription, lopt.PriceTypeID from Customers c left join CustomerTypes ct on ct.CustomerTypeID = c.CustomerTypeID left join CustomerStatuses cs on cs.CustomerStatusID = c.CustomerStatusID left join PeriodVolumes pv on pv.CustomerID = c.CustomerID left join Orders fo on fo.CustomerID = c.CustomerID -- First Order left join Orders lo on lo.CustomerID = c.CustomerID -- Last Order left join OrderDetails fod on fod.OrderID = fo.OrderID left join OrderDetails lod on lod.OrderID = lo.OrderID left join PriceTypes fopt on fo.PriceTypeID = fopt.PriceTypeID left join PriceTypes lopt on lo.PriceTypeID = lopt.PriceTypeID where c.CustomerStatusID in (1,2) and c.CustomerTypeID in (2,3) and pv.PeriodTypeID = 2 /* First Order */ and fo.OrderID = (select top 1(OrderID) from Orders where CustomerID = c.CustomerID and OrderStatusID>=7 order by OrderDate ) and fod.ItemID = (select top 1(ItemID) from OrderDetails where OrderID = fo.OrderID order by BusinessVolume) /* Last Order */ and lo.OrderID = (select top 1(OrderID) from Orders where CustomerID = c.CustomerID and OrderStatusID>=7 order by OrderDate desc) and lod.ItemID = (select top 1(ItemID) from OrderDetails where OrderID = lo.OrderID order by BusinessVolume desc) and pv.PeriodID = (select PeriodID from Periods where PeriodTypeID=2 and StartDate <= @now and EndDate >= @now)
But this ends up executing for about 6-7 minutes. From the explain plan, it looks like most of it is taken up by a Key Lookup for Orders based on OrderStatusID >= 7.
So I tried to use window functions to achieve the same:
select distinct(c.CustomerID), c.FirstName + ' ' + c.LastName as Name, cs.CustomerStatusDescription as Status, ct.CustomerTypeDescription as Type, pv.Volume80 as G3, fal.* from Customers c left join CustomerTypes ct on ct.CustomerTypeID = c.CustomerTypeID left join CustomerStatuses cs on cs.CustomerStatusID = c.CustomerStatusID left join PeriodVolumes pv on pv.CustomerID = c.CustomerID left join( select CustomerID, max(case when MinDate = 1 then OrderID end) FirstOrderID, max(case when MinDate = 1 then OrderDate end) FirstOrderDate, max(case when MinDate = 1 then BusinessVolumeTotal end) FirstBVTotal, max(case when MinDate = 1 then PriceTypeDescription end) FirstPriceType, max(case when MinDate = 1 then ItemCode end) FirstItemCode, max(case when MinDate = 1 then ItemDescription end) FirstItemDescription, max(case when MaxDate = 1 then OrderID end) LastOrderID, max(case when MaxDate = 1 then OrderDate end) LastOrderDate, max(case when MaxDate = 1 then BusinessVolumeTotal end) LastBVTotal, max(case when MaxDate = 1 then PriceTypeDescription end) LastPriceType, max(case when MaxDate = 1 then ItemCode end) LastItemCode, max(case when MaxDate = 1 then ItemDescription end) LastItemDescription from ( select distinct o.CustomerID, o.OrderID, o.OrderDate, o.BusinessVolumeTotal, PT.PriceTypeDescription, RANK() over (partition by o.CustomerID order by OrderDate) as MinDate, RANK() over (partition by o.CustomerID order by OrderDate desc) as MaxDate, FIRST_VALUE(ItemCode) over (partition by od.OrderID order by BusinessVolume desc) as ItemCode, FIRST_VALUE(ItemDescription) over (partition by od.OrderID order by BusinessVolume desc) as ItemDescription from Orders o left join OrderDetails od on od.OrderID = o.OrderID left join PriceTypes PT on o.PriceTypeID = PT.PriceTypeID where o.OrderStatusID >= 7 ) fal group by CustomerID ) fal on c.CustomerId = fal.CustomerID where c.CustomerStatusID in (1,2) and c.CustomerTypeID in (2,3) and pv.PeriodTypeID = 2 /* CurrentG3 */ and pv.PeriodID = (select PeriodID from Periods where PeriodTypeID=2 and StartDate <= @now and EndDate >= @now)
Alas, this ended up executing even longer. I need a way to optimize this if possible.
Secondary query
I do also need a count and sum of volume per Order in the last 3, 6 and 12 months. I currently do this programatically as secondary queries after the original returns a result, and I forward the CustomerIDs, like this:
select count(OrderID) as Cnt, sum(BusinessVolumeTotal) as Bv, CustomerID from Orders where OrderStatusID > 6 and OrderTypeID in (1,4,8,11) and OrderDate >= @timeAgo and CustomerID in @ids group by CustomerID
Times 3, because 3, 6 and 12 months. Ideally, I would also like to make this part of the original but I don’t really have a good idea on how to do it, especially with how convoluted the joining is with the orders.
So ideally I’d end up with a result table like this
CustomerID Name CustomerStatus CustomerType FirstOrderID FirstOrderDate FirstBVTotal FirstItemCode FirstItemDesc FirstPriceType LastOrderID LastOrderDate LastBVTotal LastItemCode LastItemDesc LastPriceType ThreeMonthCount ThreeMonthTotal SixMonthCount SixMonthTotal TwelveMonthCount TwelveMonthTotal 512312 'Jane Doe' 'Active' 'Retail' 13212 '2020-06-06' 50.00 'Item1' 'Item 1 desc' 'Retail' 14321 '2021-09-01' 200.00 'Item2' 'Item 2 desc' 'Retail' 45 4305.00 76 8545.60 183 21542.95
Any help and advice on how to optimize or reduce the query, as well as anything you believe Im doing wrong would be GREATLY appreciated.
P.S. I don’t know if the title is fitting and if I’d be able to change it later, it’s been a while since I’ve used SO to ask a question.
UPDATE
Actual Execution Plan for Query 1:
https://www.brentozar.com/pastetheplan/?id=SJd56RSmK
Actual Execution Plan for Query 2:
https://www.brentozar.com/pastetheplan/?id=BJ7QHk87Y
Advertisement
Answer
I think you need to keep in mind two main points with this type of query:
- The key to good performance with window functions is to not introduce an unnecessary sort. So while you can use
ROW_NUMBER
to get the first order in either direction, you should not use another opposingROW_NUMBER
to get the last. Rather useLEAD
to check if the next row exists, thereby telling you if this is the last row. You can then use conditional aggregation. - There are generally two ways to calculate first/last: a row-numbering solution, as above, or an
APPLY
, which picks out the exact one you need.
I think that for theOrderDetails
we should use an apply, because there are only two orders per customer that we need to find. This does need good indexing, so ifOrderDetails
is not well indexed, then you may want to switch to a row-numbering solution for this also.
select c.CustomerID, c.FirstName + ' ' + c.LastName as Name, cs.CustomerStatusDescription as Status, ct.CustomerTypeDescription as Type, pv.Volume80 as G3, o.FirstOrderID, o.FirstOrderDate, o.FirstSubTotal, o.FirstCountry, fod.ItemCode as FirstItemCode, fod.ItemDescription as FirstItemDescription, fopt.PriceTypeDescription as FirstPriceTypeDescription, o.LastOrderID, o.LastOrderDate, o.LastSubTotal, o.LastCountry, lod.ItemCode as LastItemCode, lod.ItemDescription as LastItemDescription, lopt.PriceTypeDescription as LastPriceTypeDescription from Customers c left join CustomerTypes ct on ct.CustomerTypeID = c.CustomerTypeID left join CustomerStatuses cs on cs.CustomerStatusID = c.CustomerStatusID left join PeriodVolumes pv on pv.CustomerID = c.CustomerID and pv.PeriodTypeID = 2 and pv.PeriodID = ( select top 1 PeriodID from Periods p where p.PeriodTypeID = 2 and p.StartDate <= @now and p.EndDate >= @now ) left join ( select o.CustomerID, min(case when rn = 1 then OrderID end) as FirstOrderId, min(case when rn = 1 then OrderDate end) as FirstOrderDate, min(case when rn = 1 then SubTotal end) as FirstSubTotal, min(case when rn = 1 then Country end) as FirstCountry, min(case when nx is null then OrderID end) as LastOrderId, min(case when nx is null then OrderDate end) as LastOrderDate, min(case when nx is null then SubTotal end) as LastSubTotal, min(case when nx is null then Country end) as LastCountry, count(case when o.OrderDate >= DATEADD(month, -3, GETDATE()) then 1 end) as ThreeMonthCount, sum(case when o.OrderDate >= DATEADD(month, -3, GETDATE()) then BusinessVolumeTotal end) as ThreeMonthTotal, count(case when o.OrderDate >= DATEADD(month, -6, GETDATE()) then 1 end) as SixMonthCount, sum(case when o.OrderDate >= DATEADD(month, -6, GETDATE()) then BusinessVolumeTotal end) as SixMonthTotal, count(case when o.OrderDate >= DATEADD(month, -12, GETDATE()) then 1 end) as TwelveMonthCount, sum(case when o.OrderDate >= DATEADD(month, -12, GETDATE()) then BusinessVolumeTotal end) as TwelveMonthTotal from ( select *, ROW_NUMBER() over (partition by o.CustomerID order by OrderDate) as rn, LEAD(OrderID) over (partition by o.CustomerID order by OrderDate) as nx from Orders o where o.OrderStatusID >= 7 and o.OrderTypeID in (1,4,8,11) and o.OrderDate >= @timeAgo ) o group by o.CustomerID ) o on o.CustomerID = c.CustomerID outer apply ( select top 1 od.ItemCode, od.ItemDescription from OrderDetails od order by od.BusinessVolume desc where od.OrderID = o.FirstOrderId ) fod outer apply ( select top 1 od.ItemCode, od.ItemDescription from OrderDetails od order by od.BusinessVolume desc where od.OrderID = o.LastOrderId ) lod left join PriceTypes fopt on fopt.PriceTypeID = o.FirstPriceTypeID left join PriceTypes lopt on lopt.PriceTypeID = o.LastPriceTypeID where c.CustomerStatusID in (1,2) and c.CustomerTypeID in (2,3);
I’m also going to give you a row-numbering version, as judging by your execution plan, it may actually be better. You need to try both
select c.CustomerID, c.FirstName + ' ' + c.LastName as Name, cs.CustomerStatusDescription as Status, ct.CustomerTypeDescription as Type, pv.Volume80 as G3, o.FirstOrderID, o.FirstOrderDate, o.FirstSubTotal, o.FirstCountry, o.FirstItemCode, o.FirstItemDescription, o.FirstPriceTypeDescription, o.LastOrderID, o.LastOrderDate, o.LastSubTotal, o.LastCountry, o.LastItemCode, o.LastItemDescription, o.LastPriceTypeDescription from Customers c left join CustomerTypes ct on ct.CustomerTypeID = c.CustomerTypeID left join CustomerStatuses cs on cs.CustomerStatusID = c.CustomerStatusID left join PeriodVolumes pv on pv.CustomerID = c.CustomerID and pv.PeriodTypeID = 2 and pv.PeriodID = ( select top 1 PeriodID from Periods p where p.PeriodTypeID = 2 and p.StartDate <= @now and p.EndDate >= @now ) left join ( select o.CustomerID, min(case when rn = 1 then o.OrderID end) as FirstOrderId, min(case when rn = 1 then o.OrderDate end) as FirstOrderDate, min(case when rn = 1 then o.SubTotal end) as FirstSubTotal, min(case when rn = 1 then o.Country end) as FirstCountry, min(case when rn = 1 then od.ItemCode end) as FirstItemCode, min(case when rn = 1 then od.ItemDescription end) as FirstItemDescription, min(case when rn = 1 then opt.PriceTypeDescription end) as FirstPriceTypeDescription, min(case when nx is null then o.OrderID end) as LastOrderId, min(case when nx is null then o.OrderDate end) as LastOrderDate, min(case when nx is null then o.SubTotal end) as LastSubTotal, min(case when nx is null then o.Country end) as LastCountry, min(case when nx is null then od.ItemCode end) as LastItemCode, min(case when nx is null then od.ItemDescription end) as LastItemDescription, min(case when nx is null then opt.PriceTypeDescription end) as LastPriceTypeDescription, count(case when o.OrderDate >= DATEADD(month, -3, GETDATE()) then 1 end) as ThreeMonthCount, sum(case when o.OrderDate >= DATEADD(month, -3, GETDATE()) then BusinessVolumeTotal end) as ThreeMonthTotal, count(case when o.OrderDate >= DATEADD(month, -6, GETDATE()) then 1 end) as SixMonthCount, sum(case when o.OrderDate >= DATEADD(month, -6, GETDATE()) then BusinessVolumeTotal end) as SixMonthTotal, count(case when o.OrderDate >= DATEADD(month, -12, GETDATE()) then 1 end) as TwelveMonthCount, sum(case when o.OrderDate >= DATEADD(month, -12, GETDATE()) then BusinessVolumeTotal end) as TwelveMonthTotal from ( select *, ROW_NUMBER() over (partition by o.CustomerID order by OrderDate) as rn, LEAD(OrderID) over (partition by o.CustomerID order by OrderDate) as nx from Orders o where o.OrderStatusID >= 7 and o.OrderTypeID in (1,4,8,11) and o.OrderDate >= @timeAgo ) o left join PriceTypes opt on opt.PriceTypeID = o.PriceTypeID join ( select *, ROW_NUMBER() over (partition by od.OrderID order by od.BusinessVolume desc) as rn from OrderDetails od ) od on od.OrderID = o.OrderId where rn = 1 or nx is null ) o on o.CustomerID = c.CustomerID where c.CustomerStatusID in (1,2) and c.CustomerTypeID in (2,3);
Good indexing is essential to good performance. I would expect roughly the following indexes on your tables, either clustered or non-clustered (clustered indexed INCLUDE
every other column automatically), you can obviously add other INCLUDE
columns if needed:
Customers (CustomerID) INCLUDE (FirstName, LastName) CustomerTypes (CustomerTypeID) INCLUDE (CustomerTypeDescription) CustomerStatuses (CustomerStatusID) INCLUDE (CustomerTypeDescription) PeriodVolumes (CustomerID) INCLUDE (Volume80) Periods (PeriodTypeID, StartDate, PeriodID) INCLUDE (EndDate) -- can swap Start and End Orders (CustomerID, OrderDate) INCLUDE (OrderStatusID, SubTotal, Country, BusinessVolumeTotal) OrderDetails (OrderID, BusinessVolume) INCLUDE (ItemCode ItemDescription) PriceTypes (PriceTypeID) INCLUDE (PriceTypeDescription)
You should think carefully about INNER
vs LEFT
joins, because the optimizer can more easily move around an INNER
join.
Note also, that DISTINCT
is not a function, it is calculated over an entire set of columns. Generally, one can assume that if a DISTINCT
is in the query then the joins have not been thought through properly.