Have a dataset like below and would like to know various ways to solve the question of : what % of orders were within 20 minutes of each other?
CustomerId | Order_# | Order_Date |
---|---|---|
123 | 000112 | 12/25/2011 10:30 |
123 | 000113 | 12/25/2011 10:35 |
123 | 000114 | 12/25/2011 10:45 |
123 | 000115 | 12/25/2011 10:55 |
456 | 000113 | 12/25/2011 10:35 |
456 | 000113 | 1/25/2011 10:30 |
789 | 000117 | 9/25/2011 2:00 |
Result set should look like this:
3/7 = 0.42%
My approach was to first do a Self join with the table to get a count of rows which fall within the 20% but struggling to take out the duplicate rows.
Anyways, look forward to seeing some crafty answers.
Thank you.
Advertisement
Answer
You can use lead()
and lag()
:
select avg( case when prev_order_date > order_date - interval '20 minute' or next_order_date < order_date + interval '20 minute' then 1.0 else 0 end) as ratio_within_20_minutes from (select t.*, lag(order_date) over (partition by customer_id order by order_date) as prev_order_date, lead(order_date) over (partition by customer_id order by order_date) as next_order_date from t ) t;
Note that date/time functions vary a lot among databases. This uses Standard SQL syntax for the comparisons. The exact syntax probably varies, depending on your database.
If you want this per customer then add group by customer_id
to the query and customer_id
to the select
.
EDIT:
In SQL Server, this would be:
select avg( case when prev_order_date > dateadd(minute, -20, order_date) or next_order_date < dateadd(minute, 20, order_date) then 1.0 else 0 end) as ratio_within_20_minutes from (select t.*, lag(order_date) over (partition by customer_id order by order_date) as prev_order_date, lead(order_date) over (partition by customer_id order by order_date) as next_order_date from t ) t;