Determining Where Date Ranks | BigQuery

I have purchase data that I’m trying to analyze. I’m attempting to find the 2nd and 3rd purchases (and potentially 4, 5, etc.) for each user.

The data I have currently looks like this:

email, first_purchase_date, processed_date, sku, first_order
abc@email.com   6/1/2019   6/1/2019   HG1555   HG1555
abc@email.com   6/1/2019   8/1/2019   RF2655   HG1555
abc@email.com   6/1/2019   8/1/2019   FP7789   HG1555
abc@email.com   6/1/2019  11/1/2019   RF2655   HG1555
def@email.com   7/1/2019   7/1/2019   RF2655   RF2655
def@email.com   7/1/2019   8/1/2019   HG1555   RF2655
def@email.com   7/1/2019   8/1/2019   FP7789   RF2655
xyz@email.com   9/1/2019   6/1/2019   HG1555   HG1555
xyz@email.com   9/1/2019  11/1/2019   FP7789   HG1555

​x
 
email, first_purchase_date, processed_date, sku, first_orderabc@email.com   6/1/2019   6/1/2019   HG1555   HG1555abc@email.com   6/1/2019   8/1/2019   RF2655   HG1555abc@email.com   6/1/2019   8/1/2019   FP7789   HG1555abc@email.com   6/1/2019  11/1/2019   RF2655   HG1555def@email.com   7/1/2019   7/1/2019   RF2655   RF2655def@email.com   7/1/2019   8/1/2019   HG1555   RF2655def@email.com   7/1/2019   8/1/2019   FP7789   RF2655xyz@email.com   9/1/2019   6/1/2019   HG1555   HG1555xyz@email.com   9/1/2019  11/1/2019   FP7789   HG1555​

The first order column is looking at what was in the first basket that the customer purchased. Is there a way I can create another column that will tell me exactly what purchase number each line item is? Ideally, the table will look like this:

email, first_purchase_date, processed_date, sku, first_order, order number
abc@email.com   6/1/2019   6/1/2019   HG1555   HG1555   1
abc@email.com   6/1/2019   8/1/2019   RF2655   HG1555   2
abc@email.com   6/1/2019   8/1/2019   FP7789   HG1555   2
abc@email.com   6/1/2019  11/1/2019   RF2655   HG1555   3
def@email.com   7/1/2019   7/1/2019   RF2655   RF2655   1
def@email.com   7/1/2019   8/1/2019   HG1555   RF2655   2
def@email.com   7/1/2019   8/1/2019   FP7789   RF2655   2
xyz@email.com   9/1/2019   6/1/2019   HG1555   HG1555   1
xyz@email.com   9/1/2019  11/1/2019   FP7789   HG1555   2

 
email, first_purchase_date, processed_date, sku, first_order, order numberabc@email.com   6/1/2019   6/1/2019   HG1555   HG1555   1abc@email.com   6/1/2019   8/1/2019   RF2655   HG1555   2abc@email.com   6/1/2019   8/1/2019   FP7789   HG1555   2abc@email.com   6/1/2019  11/1/2019   RF2655   HG1555   3def@email.com   7/1/2019   7/1/2019   RF2655   RF2655   1def@email.com   7/1/2019   8/1/2019   HG1555   RF2655   2def@email.com   7/1/2019   8/1/2019   FP7789   RF2655   2xyz@email.com   9/1/2019   6/1/2019   HG1555   HG1555   1xyz@email.com   9/1/2019  11/1/2019   FP7789   HG1555   2​

Basically, I want to look at each email and determine where the processed_date falls in respect to all other processed_dates that are registered to that user. The problem I’m trying to work around is when there are multiple line items for a particular processed_date.

Answer

Below is for BigQuery Standard SQL and assumes the processed_date column is of date type

#standardSQL
SELECT *,
  DENSE_RANK() OVER(PARTITION BY email ORDER BY processed_date) order_number
FROM `project.dataset.table`
-- ORDER BY email, processed_date

 
#standardSQLSELECT *,  DENSE_RANK() OVER(PARTITION BY email ORDER BY processed_date) order_numberFROM `project.dataset.table`-- ORDER BY email, processed_date   ​

If processed_date is a string – you can use below then

#standardSQL
SELECT *,
  DENSE_RANK() OVER(PARTITION BY email ORDER BY PARSE_DATE('%d/%m/%Y', processed_date)) order_number
FROM `project.dataset.table`
-- ORDER BY email, PARSE_DATE('%d/%m/%Y', processed_date)

 
#standardSQLSELECT *,  DENSE_RANK() OVER(PARTITION BY email ORDER BY PARSE_DATE('%d/%m/%Y', processed_date)) order_numberFROM `project.dataset.table`-- ORDER BY email, PARSE_DATE('%d/%m/%Y', processed_date)​

Advertisement

Answer