Having a problem with a code I wrote to perform a cohort analysis for repeat purchases. Can someone help me? My code was able to identify repeat purchases and assigning cohort for the most part BUT it is unable to assign a proper label for the last purchase of that customer. It’s being labeled as “new to brand” highlighted in red here. It should be labeled as “0-15 days.” Any suggestion? I realized that my code doesn’t differentiate between cases when this is the user’s first purchase or their final one. I’m not sure how to troubleshoot it. Any suggestion appreciated. Code and pic attached.
OLD RESULT: https://i.stack.imgur.com/BZvpy.png
NEW RESULT WITH NEW CODE: https://i.stack.imgur.com/NUndb.png
SELECT DISTINCT "T1"."amazon-order-id", "T1"."BRAND", "T1"."buyer-email", "T1"."purchase-date", MIN("T2"."purchase-date") AS "Next Purchase Date", MAX("T2"."purchase-date") AS "Last Purchase Date", DATEDIFF(Day,"T1"."purchase-date", MIN("T2"."purchase-date")) AS "Days Difference", CASE WHEN "Days Difference" < 15 THEN '0-15 days' WHEN "Days Difference" < 30 THEN '15-30 days' WHEN "Days Difference" < 45 THEN '30-45 days' WHEN "Days Difference" < 60 THEN '45-60 days' WHEN "Days Difference" > 60 THEN '60+ days' WHEN "T1"."purchase-date" = "Next Purchase Date" THEN 'NTB' END "Cohort", sum("T1"."shipped-quantity") AS "Units Sold", sum("T1"."item-price") AS "Sales" FROM "DATA_WAREHOUSE"."PUBLIC"."AMAZON_FULFILLED_SHIPMENT_REPORT" AS T1 LEFT JOIN "DATA_WAREHOUSE"."PUBLIC"."AMAZON_FULFILLED_SHIPMENT_REPORT" AS T2 ON "T1"."buyer-email" = "T2"."buyer-email" AND "T2"."purchase-date" > "T1"."purchase-date" GROUP BY "T1"."amazon-order-id", "T1"."buyer-email", "T1"."BRAND", "T1"."purchase-date" ORDER BY "T1"."BRAND", "T1"."buyer-email", "T1"."purchase-date", "T1"."amazon-order-id"
Advertisement
Answer
The problem is with your case statement. You are Left Joining T2 with purchases GREATER than the T1 Purchase Date, which for the last purchase would mean the T2 table would not return any rows.
To get Latest purchases to be labeled in the 0-15 day category, you would need to include an “Is Null” criteria or add a coalesce statement into your DATEDIFF if MIN(T2.Purchase_Date) is null to pull the T1 date.
Edit: Something like this may work if you only have one purchase per date. You may not need the second left join to get the Last Purchase Date if you don’t need that field in the query:
SELECT "T1"."amazon-order-id" ,"T1"."BRAND" ,"T1"."buyer-email" ,"T1"."purchase-date" ,"T2"."purchase-date" AS "Next Purchase Date" ,"T3"."purchase-date" AS "Last Purchase Date" ,DATEDIFF(Day, "T1"."purchase-date", "T2"."purchase-date") AS "Days Difference" ,CASE WHEN LAG("T1"."purchase-date") OVER ( PARTITION BY "T1"."buyer-email" ORDER BY "T1"."purchase-date" ) IS NULL THEN 'NTB' WHEN "T2"."purchase-date" IS NULL THEN 'Last_Purchase' WHEN DATEDIFF(Day, "T1"."purchase-date", "T2"."purchase-date") < 15 THEN '0-15 days' WHEN DATEDIFF(Day, "T1"."purchase-date", "T2"."purchase-date") < 30 THEN '15-30 days' WHEN DATEDIFF(Day, "T1"."purchase-date", "T2"."purchase-date") < 45 THEN '30-45 days' WHEN DATEDIFF(Day, "T1"."purchase-date", "T2"."purchase-date") < 60 THEN '45-60 days' WHEN DATEDIFF(Day, "T1"."purchase-date", "T2"."purchase-date") > 60 THEN '60+ days' END "Cohort" ,"T1"."shipped-quantity" AS "Units Sold" ,"T1"."item-price" AS "Sales" FROM "DATA_WAREHOUSE"."PUBLIC"."AMAZON_FULFILLED_SHIPMENT_REPORT" AS T1 LEFT JOIN "DATA_WAREHOUSE"."PUBLIC"."AMAZON_FULFILLED_SHIPMENT_REPORT" AS T2 ON "T1"."buyer-email" = "T2"."buyer-email" AND "T2"."purchase-date" = ( SELECT Min(T2E."purchase-date") FROM "DATA_WAREHOUSE"."PUBLIC"."AMAZON_FULFILLED_SHIPMENT_REPORT" AS T2E WHERE "T1"."buyer-email" = T2E ."buyer-email" and "T1"."purchase-DATE" < "T2E"."purchase-DATE") LEFT JOIN "DATA_WAREHOUSE"."PUBLIC"."AMAZON_FULFILLED_SHIPMENT_REPORT" AS T3 ON "T1"."buyer-email" = "T3"."buyer-email" AND "T3"."purchase-DATE" = (select max(T2E."purchase-DATE") from "DATA_WAREHOUSE"."PUBLIC"."AMAZON_FULFILLED_SHIPMENT_REPORT" AS T2E where "T1"."buyer-email"=T2E."buyer-email" AND "T1"."purchase-date" < "T2E"."purchase-date" ) ORDER BY "T1"."BRAND" ,"T1"."buyer-email" ,"T1"."purchase-date" ,"T1"."amazon-order-id"
In SQL Server, I would have normally just done this with a windows function within an Outer/Cross Apply to reduce the number of times referencing the DATEDIFF, but it’s not available in snowflake AFAIK.