Skip to content
Advertisement

Order count per ID and calculate time between orders in BigQuery

I am working with customer purchase data and am trying to write a query in Google BigQuery that sorts all purchases by date and adds a purchase/order count per customer (order_count). Also, I would like to calculate a time-delay (in days) between the orders of one customer (purchase_latency). My query currently looks like this:

    select
email,
first_name,
last_name,
order_number,
purchase_date,
order_price,
d.code,
from my_order_data 
left join
unnest(discount_codes) as d

The result including “order_count” and “purchase_latency” should look like the following:

email   | order_number | purchase_date               | order_price | order_count | purchase_latency 
a@a.com | 34874        | 2020-01-02 16:20:12 UTC     | 20,-        | 1           | 0            |
a@a.com | 43598        | 2020-01-18 12:00:00 UTC     | 30,-        | 2           | 16           |
a@a.com | 47520        | 2020-01-30 08:05:00 UTC     | 15,-        | 3           | 12           |
b@b.com | 23598        | 2019-03-25 22:10:00 UTC     | 22,-        | 1           | 0            |
b@b.com | 25459        | 2019-03-31 17:35:00 UTC     | 55,-        | 2           | 6            |

How would I add the numbering for “order_count” as well as the calculation for “purchase_latency”?

Thank you very much!

Advertisement

Answer

You can use window functions:

  • to enumerate the orders of each customer by increasing purchase date, you can use row_number()

  • lag() retrieves the date of the “previous” purchase, and you can compute the difference with the current date with date_diff():

So:

select
    email,
    first_name,
    last_name,
    order_number,
    purchase_date,
    order_price,
    row_number() over(partition by email order by purchase_date) order_count,
    date_diff(
        date(purchase_date),
        coalesce(date(lag(purchase_date) over(partition by email order by purchase_date)), date(purchase_date)),
        day
    ) purchase_latency 
from my_order_data od
left join unnest(od.discount_codes) as dc

Note: I would highly recommend prefixing all the column names in the query with the (alias of the) table they belong to; this makes the query unambiguous, and much easier to follow.

User contributions licensed under: CC BY-SA
1 People found this is helpful
Advertisement