I am relatively new to SQL, so please bear with me! I am trying to see how many customers make a purchase after being dormant for two years. Relevant fields include cust_id
and purchase_date
(there can be several observations for the same cust_id
but with different dates). I am using Redshift for my SQL scripts.
I realize I cannot put the same thing in for the DATEDIFF
parameters (it just doesn’t make any sense), but I am unsure what else to do.
SELECT * FROM tickets t LEFT JOIN d_customer c ON c.cust_id = t.cust_id WHERE DATEDIFF(year, t.purchase_date, t.purchase_date) between 0 and 2 ORDER BY t.cust_id, t.purchase_date ;
Advertisement
Answer
I think you want lag()
. To get the relevant tickets:
SELECT t.* FROM (SELECT t.*, LAG(purchase_date) OVER (PARTITION BY cust_id ORDER BY purchase_date) as prev_pd FROM tickets t ) t WHERE prev_pd < purchase_date - interval '2 year';
If you want the number of customers, use count(distinct)
:
SELECT COUNT(DISTINCT cust_id) FROM (SELECT t.*, LAG(purchase_date) OVER (PARTITION BY cust_id ORDER BY purchase_date) as prev_pd FROM tickets t ) t WHERE prev_pd < purchase_date - interval '2 year';
Note that these do not use DATEDIFF()
. This counts the number of boundaries between two date values. So, 2018-12-31 and 2019-01-01 have a difference of 1 year.