Skip to content
Advertisement

SQL Server: table join based on record-dependent values

There’s a general type of query I’m trying to perform, and I’m not sure how to express it in words so that I can find a discussion of best practices and examples for executing it.

Here’s an example use case.

I have a customers table that has info about customers and an orders table. I want to fetch a subset of records from orders based on customer characteristics, limited by the “earliest” and “latest” dates contained as data in the customers table. It’s essential to the solution that I limit my query results to within this date range, which varies by customer.

CUSTOMERS

+------------+------------+----------+---------------------+-------------------+
| CustomerID |  Location  | Industry | EarliestActiveOrder | LatestActiveOrder |
+------------+------------+----------+---------------------+-------------------+
|        001 | New York   | Finance  | 2017-11-03          | 2019-07-30        |
|        002 | California | Tech     | 2018-06-18          | 2019-09-22        |
|        003 | New York   | Finance  | 2015-09-30          | 2019-02-26        |
|        004 | California | Finance  | 2019-02-02          | 2019-08-15        |
|        005 | New York   | Finance  | 2017-10-19          | 2018-12-20        |
+------------+------------+----------+---------------------+-------------------+

ORDERS

+----------+------------+------------+---------+
| OrderID | CustomerID | StartDate  | Details |
+----------+------------+------------+---------+
|     5430 |        003 | 2015-06-30 |     ... |
|     5431 |        003 | 2016-03-31 |     ... |
|     5432 |        003 | 2018-09-30 |     ... |
|     5434 |        001 | 2018-11-05 |     ... |
|     5435 |        001 | 2019-10-11 |     ... |

A sample use case expressed in words would be: “Give me all Active Orders from Finance customers in New York”.

Desired result is to return the full records from orders table for OrderID‘s 5431,5432,5434.

What is a generally good approach for structuring this kind of query, given an orders table with ~10^6 records?

Advertisement

Answer

You are looking for a join:

select o.*
from orders o
inner join customers c 
    on  c.Customer_id = o.Customer_id
    and o.StartDate between c.EarliestActiveOrder  and c.LatestActiveOrder
    and c.Industry = 'Finance'
    and c.Location = 'New York'

For performance in this query, consider the following indexes:

orders(customer_id,  StartDate)
customers(Customer_id, Industry, Location, EarliestActiveOrder, LatestActiveOrder)
User contributions licensed under: CC BY-SA
7 People found this is helpful
Advertisement