Skip to content
Advertisement

SQL BigQuery – how can I calculate for each order the the count of that customers’ orders in the preceding full 12 month period

For each order record I would like to calculate the count of prior orders, from that customer, over the previous full 12 month period (excluding the month of the order).

I can get the count work without the date limits (code below).

But I just can’t figure out how to limit the count to the ‘rolling’ date range.

I’d really appreciate any suggestions on what I’m missing!

With input_data AS (

SELECT '#1238' as order_id, DATE('2021-12-15') as order_date, 'c12345' as cust_id, 18 as order_value
UNION ALL SELECT '#1201' as order_id, DATE('2021-10-10') as order_date, 'c12345' as cust_id, 18 as order_value
UNION ALL SELECT '#1198' as order_id, DATE('2021-07-05') as order_date, 'c12345' as cust_id, 20 as order_value
UNION ALL SELECT '#1134' as order_id, DATE('2020-10-15') as order_date, 'c12345' as cust_id, 10 as order_value
UNION ALL SELECT '#1112' as order_id, DATE('2019-08-10') as order_date, 'c12345' as cust_id, 5 as order_value
UNION ALL SELECT '#1234' as order_id, DATE('2021-07-05') as order_date, 'c11111' as cust_id, 118 as order_value
UNION ALL SELECT '#1294' as order_id, DATE('2021-01-05') as order_date, 'c11111' as cust_id, 68 as order_value
UNION ALL SELECT '#1290' as order_id, DATE('2021-01-01') as order_date, 'c11111' as cust_id, 82 as order_value
UNION ALL SELECT '#1284' as order_id, DATE('2020-01-15') as order_date, 'c22222' as cust_id, 98 as order_value)

SELECT
order_id
, cust_id
, order_date
, prev_12m_orders
FROM (
    SELECT order_id, cust_id, order_date,
COUNT(order_id) OVER(PARTITION BY cust_id ORDER BY order_date DESC ROWS BETWEEN 1 FOLLOWING AND UNBOUNDED FOLLOWING) AS prev_12m_orders
FROM input_data
) 

-- Limit prev_12m_orders to the range of the last complete 12 month period, something like 
-- order_date < DATE_SUB(DATE_TRUNC(order_date, MONTH), INTERVAL 1 DAY) AS last_day_prev_mth
-- order_date < DATE_SUB(DATE_TRUNC(order_date, MONTH), INTERVAL 12 MONTH) AS first_day_full_12m_ago
-- If possible it should return NULL where there are no orders more than 12 months prior to the order being evaluated   

This generates the following output (with comments on expected values).

| Row | order_id | cust_id | order_date | prev_12m_orders | Comment                                |
|-----|----------|---------|------------|-----------------|----------------------------------------|
| 1   | #1234    | c11111  | 2021-07-05 |               2 | Correct                                |
| 2   | #1294    | c11111  | 2021-01-05 |               1 | Should be 0 as order in same month     |
| 3   | #1290    | c11111  | 2021-01-01 |               0 | Correct                                |
| 4   | #1238    | c12345  | 2021-12-15 |               4 | Should be 2 as last orders out of range |
| 5   | #1201    | c12345  | 2021-10-10 |               3 | Should be 2 as last orders out of range |
| 6   | #1198    | c12345  | 2021-07-05 |               2 | Should be 1 as last order out of range |
| 7   | #1134    | c12345  | 2020-10-15 |               1 | Should be 0 as last order out of range |
| 8   | #1112    | c12345  | 2019-08-10 |               0 | Should be NULL as >12m prior orders    |
| 9   | #1284    | c22222  | 2020-01-15 |               0 | Should be NULL as >12m prior orders    |

Any suggestions much appreciated…

Advertisement

Answer

Consider below approach

select *, 
  count(order_id) over last_12m as prev_12m_orders
from input_data
window last_12m as (
  partition by cust_id 
  order by cast(format_date('%Y%m', order_date ) as int64)
  range between 100 preceding  and 1 preceding
)              

if applied to sample data in your question – output is

enter image description here

To address you second requirement with null – use below

select * except(prev_12m_orders, prior_12m_orders),
  if(prev_12m_orders = 0, if(prior_12m_orders = 1, null, 0), prev_12m_orders) as prev_12m_orders
from (
  select *, 
    count(order_id) over last_12m as prev_12m_orders, 
    count(order_id) over prior_12m as prior_12m_orders
  from input_data
  window last_12m as (
    partition by cust_id 
    order by cast(format_date('%Y%m', order_date ) as int64)
    range between 100 preceding  and 1 preceding
  ), prior_12m as (
    partition by cust_id 
    order by cast(format_date('%Y%m', order_date ) as int64)
    range between unbounded preceding  and current row
  )
)

with output

enter image description here

User contributions licensed under: CC BY-SA
2 People found this is helpful
Advertisement