There’s a general type of query I’m trying to perform, and I’m not sure how to express it in words so that I can find a discussion of best practices and examples for executing it.
Here’s an example use case.
I have a customers
table that has info about customers and an orders
table. I want to fetch a subset of records from orders
based on customer characteristics, limited by the “earliest” and “latest” dates contained as data in the customers
table. It’s essential to the solution that I limit my query results to within this date range, which varies by customer.
CUSTOMERS
+------------+------------+----------+---------------------+-------------------+ | CustomerID | Location | Industry | EarliestActiveOrder | LatestActiveOrder | +------------+------------+----------+---------------------+-------------------+ | 001 | New York | Finance | 2017-11-03 | 2019-07-30 | | 002 | California | Tech | 2018-06-18 | 2019-09-22 | | 003 | New York | Finance | 2015-09-30 | 2019-02-26 | | 004 | California | Finance | 2019-02-02 | 2019-08-15 | | 005 | New York | Finance | 2017-10-19 | 2018-12-20 | +------------+------------+----------+---------------------+-------------------+
ORDERS
+----------+------------+------------+---------+ | OrderID | CustomerID | StartDate | Details | +----------+------------+------------+---------+ | 5430 | 003 | 2015-06-30 | ... | | 5431 | 003 | 2016-03-31 | ... | | 5432 | 003 | 2018-09-30 | ... | | 5434 | 001 | 2018-11-05 | ... | | 5435 | 001 | 2019-10-11 | ... |
A sample use case expressed in words would be: “Give me all Active Orders from Finance customers in New York”.
Desired result is to return the full records from orders
table for OrderID
‘s 5431,5432,5434
.
What is a generally good approach for structuring this kind of query, given an orders
table with ~10^6 records?
Advertisement
Answer
You are looking for a join:
select o.* from orders o inner join customers c on c.Customer_id = o.Customer_id and o.StartDate between c.EarliestActiveOrder and c.LatestActiveOrder and c.Industry = 'Finance' and c.Location = 'New York'
For performance in this query, consider the following indexes:
orders(customer_id, StartDate) customers(Customer_id, Industry, Location, EarliestActiveOrder, LatestActiveOrder)