Skip to content
Advertisement

Get first and last Order and the highest value Item in each order for each Customer, all of which are separate tables

I need to find the first and last Order for each Customer by OrderDate, and the name and SKU of the item with the highest business volume in each of those orders. For reference, the Customer table has >150k records, and Orders and OrderDetails (these are the Items) a lot more.

Note: Both Orders and their respective items should be in the same row as the Customer

Orders

OrderDetails

In my first query I attempted to stick to joining instead of subqueries, which resulted in this

But this ends up executing for about 6-7 minutes. From the explain plan, it looks like most of it is taken up by a Key Lookup for Orders based on OrderStatusID >= 7.

So I tried to use window functions to achieve the same:

Alas, this ended up executing even longer. I need a way to optimize this if possible.

Secondary query

I do also need a count and sum of volume per Order in the last 3, 6 and 12 months. I currently do this programatically as secondary queries after the original returns a result, and I forward the CustomerIDs, like this:

Times 3, because 3, 6 and 12 months. Ideally, I would also like to make this part of the original but I don’t really have a good idea on how to do it, especially with how convoluted the joining is with the orders.

So ideally I’d end up with a result table like this

Any help and advice on how to optimize or reduce the query, as well as anything you believe Im doing wrong would be GREATLY appreciated.

P.S. I don’t know if the title is fitting and if I’d be able to change it later, it’s been a while since I’ve used SO to ask a question.

UPDATE

Actual Execution Plan for Query 1:
https://www.brentozar.com/pastetheplan/?id=SJd56RSmK

Actual Execution Plan for Query 2:
https://www.brentozar.com/pastetheplan/?id=BJ7QHk87Y

Advertisement

Answer

I think you need to keep in mind two main points with this type of query:

  • The key to good performance with window functions is to not introduce an unnecessary sort. So while you can use ROW_NUMBER to get the first order in either direction, you should not use another opposing ROW_NUMBER to get the last. Rather use LEAD to check if the next row exists, thereby telling you if this is the last row. You can then use conditional aggregation.
  • There are generally two ways to calculate first/last: a row-numbering solution, as above, or an APPLY, which picks out the exact one you need.
    I think that for the OrderDetails we should use an apply, because there are only two orders per customer that we need to find. This does need good indexing, so if OrderDetails is not well indexed, then you may want to switch to a row-numbering solution for this also.

I’m also going to give you a row-numbering version, as judging by your execution plan, it may actually be better. You need to try both


Good indexing is essential to good performance. I would expect roughly the following indexes on your tables, either clustered or non-clustered (clustered indexed INCLUDE every other column automatically), you can obviously add other INCLUDE columns if needed:

You should think carefully about INNER vs LEFT joins, because the optimizer can more easily move around an INNER join.

Note also, that DISTINCT is not a function, it is calculated over an entire set of columns. Generally, one can assume that if a DISTINCT is in the query then the joins have not been thought through properly.

User contributions licensed under: CC BY-SA
9 People found this is helpful
Advertisement