Get first and last Order and the highest value Item in each order for each Customer, all of which are separate tables

Question

I need to find the first and last Order for each Customer by OrderDate, and the name and SKU of the item with the highest business volume in each of those orders. For reference, the Customer table has >150k records, and Orders and OrderDetails (these are the Items) a lot more. Note: Both Orders and their respective items should be

Accepted Answer

I think you need to keep in mind two main points with this type of query:The key to good performance with window functions is to not introduce an unnecessary sort. So while you can use ROW_NUMBER to get the first order in either direction, you should not use another opposing ROW_NUMBER to get the last. Rather use LEAD to check if the next row exists, thereby telling you if this is the last row. You can then use conditional aggregation.There are generally two ways to calculate first/last: a row-numbering solution, as above, or an APPLY, which picks out the exact one you need.I think that for the OrderDetails we should use an apply, because there are only two orders per customer that we need to find. This does need good indexing, so if OrderDetails is not well indexed, then you may want to switch to a row-numbering solution for this also.select c.CustomerID, c.FirstName + ' ' + c.LastName as Name, cs.CustomerStatusDescription as Status, ct.CustomerTypeDescription as Type, pv.Volume80 as G3, o.FirstOrderID, o.FirstOrderDate, o.FirstSubTotal, o.FirstCountry, fod.ItemCode as FirstItemCode, fod.ItemDescription as FirstItemDescription, fopt.PriceTypeDescription as FirstPriceTypeDescription, o.LastOrderID, o.LastOrderDate, o.LastSubTotal, o.LastCountry, lod.ItemCode as LastItemCode, lod.ItemDescription as LastItemDescription, lopt.PriceTypeDescription as LastPriceTypeDescription from Customers cleft join CustomerTypes ct on ct.CustomerTypeID = c.CustomerTypeIDleft join CustomerStatuses cs on cs.CustomerStatusID = c.CustomerStatusIDleft join PeriodVolumes pv on pv.CustomerID = c.CustomerID and pv.PeriodTypeID = 2 and pv.PeriodID = ( select top 1 PeriodID from Periods p where p.PeriodTypeID = 2 and p.StartDate <= @now and p.EndDate >= @now )left join ( select o.CustomerID, min(case when rn = 1 then OrderID end) as FirstOrderId, min(case when rn = 1 then OrderDate end) as FirstOrderDate, min(case when rn = 1 then SubTotal end) as FirstSubTotal, min(case when rn = 1 then Country end) as FirstCountry, min(case when nx is null then OrderID end) as LastOrderId, min(case when nx is null then OrderDate end) as LastOrderDate, min(case when nx is null then SubTotal end) as LastSubTotal, min(case when nx is null then Country end) as LastCountry, count(case when o.OrderDate >= DATEADD(month, -3, GETDATE()) then 1 end) as ThreeMonthCount, sum(case when o.OrderDate >= DATEADD(month, -3, GETDATE()) then BusinessVolumeTotal end) as ThreeMonthTotal, count(case when o.OrderDate >= DATEADD(month, -6, GETDATE()) then 1 end) as SixMonthCount, sum(case when o.OrderDate >= DATEADD(month, -6, GETDATE()) then BusinessVolumeTotal end) as SixMonthTotal, count(case when o.OrderDate >= DATEADD(month, -12, GETDATE()) then 1 end) as TwelveMonthCount, sum(case when o.OrderDate >= DATEADD(month, -12, GETDATE()) then BusinessVolumeTotal end) as TwelveMonthTotal from ( select *, ROW_NUMBER() over (partition by o.CustomerID order by OrderDate) as rn, LEAD(OrderID) over (partition by o.CustomerID order by OrderDate) as nx from Orders o where o.OrderStatusID >= 7 and o.OrderTypeID in (1,4,8,11) and o.OrderDate >= @timeAgo ) o group by o.CustomerID) o on o.CustomerID = c.CustomerIDouter apply ( select top 1 od.ItemCode, od.ItemDescription from OrderDetails od order by od.BusinessVolume desc where od.OrderID = o.FirstOrderId) fodouter apply ( select top 1 od.ItemCode, od.ItemDescription from OrderDetails od order by od.BusinessVolume desc where od.OrderID = o.LastOrderId) lodleft join PriceTypes fopt on fopt.PriceTypeID = o.FirstPriceTypeID left join PriceTypes lopt on lopt.PriceTypeID = o.LastPriceTypeID where c.CustomerStatusID in (1,2) and c.CustomerTypeID in (2,3);I’m also going to give you a row-numbering version, as judging by your execution plan, it may actually be better. You need to try bothselect c.CustomerID, c.FirstName + ' ' + c.LastName as Name, cs.CustomerStatusDescription as Status, ct.CustomerTypeDescription as Type, pv.Volume80 as G3, o.FirstOrderID, o.FirstOrderDate, o.FirstSubTotal, o.FirstCountry, o.FirstItemCode, o.FirstItemDescription, o.FirstPriceTypeDescription, o.LastOrderID, o.LastOrderDate, o.LastSubTotal, o.LastCountry, o.LastItemCode, o.LastItemDescription, o.LastPriceTypeDescription from Customers cleft join CustomerTypes ct on ct.CustomerTypeID = c.CustomerTypeIDleft join CustomerStatuses cs on cs.CustomerStatusID = c.CustomerStatusIDleft join PeriodVolumes pv on pv.CustomerID = c.CustomerID and pv.PeriodTypeID = 2 and pv.PeriodID = ( select top 1 PeriodID from Periods p where p.PeriodTypeID = 2 and p.StartDate <= @now and p.EndDate >= @now )left join ( select o.CustomerID, min(case when rn = 1 then o.OrderID end) as FirstOrderId, min(case when rn = 1 then o.OrderDate end) as FirstOrderDate, min(case when rn = 1 then o.SubTotal end) as FirstSubTotal, min(case when rn = 1 then o.Country end) as FirstCountry, min(case when rn = 1 then od.ItemCode end) as FirstItemCode, min(case when rn = 1 then od.ItemDescription end) as FirstItemDescription, min(case when rn = 1 then opt.PriceTypeDescription end) as FirstPriceTypeDescription, min(case when nx is null then o.OrderID end) as LastOrderId, min(case when nx is null then o.OrderDate end) as LastOrderDate, min(case when nx is null then o.SubTotal end) as LastSubTotal, min(case when nx is null then o.Country end) as LastCountry, min(case when nx is null then od.ItemCode end) as LastItemCode, min(case when nx is null then od.ItemDescription end) as LastItemDescription, min(case when nx is null then opt.PriceTypeDescription end) as LastPriceTypeDescription, count(case when o.OrderDate >= DATEADD(month, -3, GETDATE()) then 1 end) as ThreeMonthCount, sum(case when o.OrderDate >= DATEADD(month, -3, GETDATE()) then BusinessVolumeTotal end) as ThreeMonthTotal, count(case when o.OrderDate >= DATEADD(month, -6, GETDATE()) then 1 end) as SixMonthCount, sum(case when o.OrderDate >= DATEADD(month, -6, GETDATE()) then BusinessVolumeTotal end) as SixMonthTotal, count(case when o.OrderDate >= DATEADD(month, -12, GETDATE()) then 1 end) as TwelveMonthCount, sum(case when o.OrderDate >= DATEADD(month, -12, GETDATE()) then BusinessVolumeTotal end) as TwelveMonthTotal from ( select *, ROW_NUMBER() over (partition by o.CustomerID order by OrderDate) as rn, LEAD(OrderID) over (partition by o.CustomerID order by OrderDate) as nx from Orders o where o.OrderStatusID >= 7 and o.OrderTypeID in (1,4,8,11) and o.OrderDate >= @timeAgo ) o left join PriceTypes opt on opt.PriceTypeID = o.PriceTypeID join ( select *, ROW_NUMBER() over (partition by od.OrderID order by od.BusinessVolume desc) as rn from OrderDetails od ) od on od.OrderID = o.OrderId where rn = 1 or nx is null) o on o.CustomerID = c.CustomerIDwhere c.CustomerStatusID in (1,2) and c.CustomerTypeID in (2,3);Good indexing is essential to good performance. I would expect roughly the following indexes on your tables, either clustered or non-clustered (clustered indexed INCLUDE every other column automatically), you can obviously add other INCLUDE columns if needed:Customers (CustomerID) INCLUDE (FirstName, LastName)CustomerTypes (CustomerTypeID) INCLUDE (CustomerTypeDescription)CustomerStatuses (CustomerStatusID) INCLUDE (CustomerTypeDescription)PeriodVolumes (CustomerID) INCLUDE (Volume80)Periods (PeriodTypeID, StartDate, PeriodID) INCLUDE (EndDate) -- can swap Start and EndOrders (CustomerID, OrderDate) INCLUDE (OrderStatusID, SubTotal, Country, BusinessVolumeTotal)OrderDetails (OrderID, BusinessVolume) INCLUDE (ItemCode ItemDescription)PriceTypes (PriceTypeID) INCLUDE (PriceTypeDescription)You should think carefully about INNER vs LEFT joins, because the optimizer can more easily move around an INNER join.Note also, that DISTINCT is not a function, it is calculated over an entire set of columns. Generally, one can assume that if a DISTINCT is in the query then the joins have not been thought through properly.

Get first and last Order and the highest value Item in each order for each Customer, all of which are separate tables

Orders

OrderDetails

Secondary query

UPDATE

Advertisement

Answer