For each order record I would like to calculate the count of prior orders, from that customer, over the previous full 12 month period (excluding the month of the order).
I can get the count work without the date limits (code below).
But I just can’t figure out how to limit the count to the ‘rolling’ date range.
I’d really appreciate any suggestions on what I’m missing!
With input_data AS ( SELECT '#1238' as order_id, DATE('2021-12-15') as order_date, 'c12345' as cust_id, 18 as order_value UNION ALL SELECT '#1201' as order_id, DATE('2021-10-10') as order_date, 'c12345' as cust_id, 18 as order_value UNION ALL SELECT '#1198' as order_id, DATE('2021-07-05') as order_date, 'c12345' as cust_id, 20 as order_value UNION ALL SELECT '#1134' as order_id, DATE('2020-10-15') as order_date, 'c12345' as cust_id, 10 as order_value UNION ALL SELECT '#1112' as order_id, DATE('2019-08-10') as order_date, 'c12345' as cust_id, 5 as order_value UNION ALL SELECT '#1234' as order_id, DATE('2021-07-05') as order_date, 'c11111' as cust_id, 118 as order_value UNION ALL SELECT '#1294' as order_id, DATE('2021-01-05') as order_date, 'c11111' as cust_id, 68 as order_value UNION ALL SELECT '#1290' as order_id, DATE('2021-01-01') as order_date, 'c11111' as cust_id, 82 as order_value UNION ALL SELECT '#1284' as order_id, DATE('2020-01-15') as order_date, 'c22222' as cust_id, 98 as order_value) SELECT order_id , cust_id , order_date , prev_12m_orders FROM ( SELECT order_id, cust_id, order_date, COUNT(order_id) OVER(PARTITION BY cust_id ORDER BY order_date DESC ROWS BETWEEN 1 FOLLOWING AND UNBOUNDED FOLLOWING) AS prev_12m_orders FROM input_data ) -- Limit prev_12m_orders to the range of the last complete 12 month period, something like -- order_date < DATE_SUB(DATE_TRUNC(order_date, MONTH), INTERVAL 1 DAY) AS last_day_prev_mth -- order_date < DATE_SUB(DATE_TRUNC(order_date, MONTH), INTERVAL 12 MONTH) AS first_day_full_12m_ago -- If possible it should return NULL where there are no orders more than 12 months prior to the order being evaluated
This generates the following output (with comments on expected values).
| Row | order_id | cust_id | order_date | prev_12m_orders | Comment | |-----|----------|---------|------------|-----------------|----------------------------------------| | 1 | #1234 | c11111 | 2021-07-05 | 2 | Correct | | 2 | #1294 | c11111 | 2021-01-05 | 1 | Should be 0 as order in same month | | 3 | #1290 | c11111 | 2021-01-01 | 0 | Correct | | 4 | #1238 | c12345 | 2021-12-15 | 4 | Should be 2 as last orders out of range | | 5 | #1201 | c12345 | 2021-10-10 | 3 | Should be 2 as last orders out of range | | 6 | #1198 | c12345 | 2021-07-05 | 2 | Should be 1 as last order out of range | | 7 | #1134 | c12345 | 2020-10-15 | 1 | Should be 0 as last order out of range | | 8 | #1112 | c12345 | 2019-08-10 | 0 | Should be NULL as >12m prior orders | | 9 | #1284 | c22222 | 2020-01-15 | 0 | Should be NULL as >12m prior orders |
Any suggestions much appreciated…
Advertisement
Answer
Consider below approach
select *, count(order_id) over last_12m as prev_12m_orders from input_data window last_12m as ( partition by cust_id order by cast(format_date('%Y%m', order_date ) as int64) range between 100 preceding and 1 preceding )
if applied to sample data in your question – output is
To address you second requirement with null
– use below
select * except(prev_12m_orders, prior_12m_orders), if(prev_12m_orders = 0, if(prior_12m_orders = 1, null, 0), prev_12m_orders) as prev_12m_orders from ( select *, count(order_id) over last_12m as prev_12m_orders, count(order_id) over prior_12m as prior_12m_orders from input_data window last_12m as ( partition by cust_id order by cast(format_date('%Y%m', order_date ) as int64) range between 100 preceding and 1 preceding ), prior_12m as ( partition by cust_id order by cast(format_date('%Y%m', order_date ) as int64) range between unbounded preceding and current row ) )
with output