Skip to content
Advertisement

DATE_DIFF() in BigQuery to calculate time between rows

I would like to calculate the time delay between several customer purchases. However, each purchase is saved in an individual row. The data set looks similar to the following:

customer  | order_id | purchase_date | product          | sequencen| ... | 
customer1 | 1247857  | 2020-01-30    | ProdA, ProdB     |     1    | ... | 
customer2 | 4454874  | 2020-02-07    | ProdA            |     1    | ... | 
customer3 | 3424556  | 2020-02-28    | ProdA            |     1    | ... | 
customer4 | 5678889  | 2020-03-14    | ProdB            |     1    | ... | 
customer3 | 5853778  | 2020-03-22    | ProdA, ProdB     |     2    | ... | 
customer4 | 7578345  | 2020-03-30    | ProdA, ProdB     |     2    | ... | 
customer2 | 4892978  | 2020-05-10    | ProdA            |     2    | ... | 
customer5 | 4834789  | 2020-07-05    | ProdA, ProdB     |     1    | ... | 
customer5 | 9846726  | 2020-07-27    | ProdB            |     2    | ... | 
customer1 | 1774783  | 2020-12-12    | ProdB            |     2    | ... | 

Per customer, I would like to end up with a table that calculates the time-difference (in days) between a certain purchase and the purchase that came before. Basically, I would like to know the time delay (latency) between a customers first and second purchase, second and third purchase, and so on. The result should look like the following:

    customer  | order_id | purchase_date  | product          | sequencen| ... | purchase_latency
    customer1 | 1247857  | 2020-01-30     | ProdA, ProdB     |   1      | ... | 
    customer1 | 1774783  | 2020-12-12     | ProdB            |   2      | ... | 317
    customer2 | 4454874  | 2020-02-07     | ProdA            |   1      | ... |
    customer2 | 4892978  | 2020-05-10     | ProdA            |   2      | ... | 93
    customer3 | 3424556  | 2020-02-28     | ProdA            |   1      | ... |
    customer3 | 5853778  | 2020-03-22     | ProdA, ProdB     |   2      | ... | 23
    customer4 | 5678889  | 2020-03-14     | ProdB            |   1      | ... |
    customer4 | 7578345  | 2020-03-30     | ProdA, ProdB     |   2      | ... | 16
    customer5 | 4834789  | 2020-07-05     | ProdA, ProdB     |   1      | ... |
    customer5 | 9846726  | 2020-07-27     | ProdB            |   2      | ... | 22

I am struggling to add the purchase_latency calculation to my current query, as the calculation would require me to do a calculation over rows. Any ideas how I could add this to my current query?:

SELECT
order_id
max(customer) as customer,
max(purchase_date) as purchase_date,
STRING_AGG(product, ",") as product,
...,
FROM SELECT(
od.order_number as order_id,
od.customer_email as customer,
od.order_date as purchase_date
dd.sku as product,
ROW_NUMBER() OVER (PARTITION BY od.customer_email ORDER BY od.order_date ) as sequencen
FROM orders_data od
JOIN detail_data dd
ON od.order_number = dd.order_number
where od.price> 0 AND
od.sku in ("ProdA","ProdB"))
GROUP BY order_id

Advertisement

Answer

Did you try row navigation functions like LAG?

WITH finishers AS
 (SELECT 'Sophia Liu' as name,
  TIMESTAMP '2016-10-18 2:51:45' as finish_time,
  'F30-34' as division
  UNION ALL SELECT 'Lisa Stelzner', TIMESTAMP '2016-10-18 2:54:11', 'F35-39'
  UNION ALL SELECT 'Nikki Leith', TIMESTAMP '2016-10-18 2:59:01', 'F30-34'
  UNION ALL SELECT 'Lauren Matthews', TIMESTAMP '2016-10-18 3:01:17', 'F35-39'
  UNION ALL SELECT 'Desiree Berry', TIMESTAMP '2016-10-18 3:05:42', 'F35-39'
  UNION ALL SELECT 'Suzy Slane', TIMESTAMP '2016-10-18 3:06:24', 'F35-39'
  UNION ALL SELECT 'Jen Edwards', TIMESTAMP '2016-10-18 3:06:36', 'F30-34'
  UNION ALL SELECT 'Meghan Lederer', TIMESTAMP '2016-10-18 3:07:41', 'F30-34'
  UNION ALL SELECT 'Carly Forte', TIMESTAMP '2016-10-18 3:08:58', 'F25-29'
  UNION ALL SELECT 'Lauren Reasoner', TIMESTAMP '2016-10-18 3:10:14', 'F30-34')
SELECT name,
  finish_time,
  division,
  LAG(name)
    OVER (PARTITION BY division ORDER BY finish_time ASC) AS preceding_runner
FROM finishers;

+-----------------+-------------+----------+------------------+
| name            | finish_time | division | preceding_runner |
+-----------------+-------------+----------+------------------+
| Carly Forte     | 03:08:58    | F25-29   | NULL             |
| Sophia Liu      | 02:51:45    | F30-34   | NULL             |
| Nikki Leith     | 02:59:01    | F30-34   | Sophia Liu       |
| Jen Edwards     | 03:06:36    | F30-34   | Nikki Leith      |
| Meghan Lederer  | 03:07:41    | F30-34   | Jen Edwards      |
| Lauren Reasoner | 03:10:14    | F30-34   | Meghan Lederer   |
| Lisa Stelzner   | 02:54:11    | F35-39   | NULL             |
| Lauren Matthews | 03:01:17    | F35-39   | Lisa Stelzner    |
| Desiree Berry   | 03:05:42    | F35-39   | Lauren Matthews  |
| Suzy Slane      | 03:06:24    | F35-39   | Desiree Berry    |
+-----------------+-------------+----------+------------------+

User contributions licensed under: CC BY-SA
3 People found this is helpful
Advertisement