I am trying to compare values for each user per each retailer transaction. This is the input example table:
user_id|retailer_id|amount_spent 1 |2 |30 1 |2 |10 1 |2 |28
Now, I want to compare for each distinct user for the same retailer if the amount spent was within 30% across all purchases. Let’s say that the amount spent of the first and second transaction is 67% apart ($30 and $10) which is greater than the 30% threshold. However, the third row with spent of 28$ is within 30% of variance compared to the 30$ from the first row. So, these two transactions will fulfill the criteria, i.e. comparison of row 1 and row 3.
Current query:
select distinct a.customer_id, a.purchase_date from ( select customer_id, retailer, purchase_date, purchase_amount, Lag(purchase_amount) over (partition by customer_id,retailer) as previous_amt from tbl )a where abs(a.purchase_amount-a.previous_amt)/a.purchase_amount <=0.3
Outout would give me null rows because it is comparing sequential transaction amounts. However, it is not accounting that row 1 and row 3 fulfill the criteria and therefore, it would return those 2 rows.
How do I adjust my query from here?
Advertisement
Answer
Consider the following…
DROP TABLE IF EXISTS my_table; CREATE TABLE my_table (id SERIAL PRIMARY KEY ,user_id INT NOT NULL ,retailer_id INT NOT NULL ,amount_spent INT NOT NULL ); INSERT INTO my_table (user_id,retailer_id,amount_spent) VALUES (1,2,30), (1,2,10), (1,2,28), (1,3,10), (1,3,40), (2,1,20);
The following query will show us all rows with no other (user_id,retailer) combination within 30% of another one (my arithmetic or logic might be slighhtly skewed but hopefuly you get the idea)…
SELECT a.* FROM my_table a LEFT JOIN ( SELECT y.* FROM my_table x JOIN my_table y ON y.id <> x.id AND y.user_id = x.user_id AND y.retailer_id = x.retailer_id AND y.amount_spent BETWEEN x.amount_spent * 0.3 AND x.amount_spent * 1.3 ) b ON b.id = a.id WHERE b.id IS NULL; +----+---------+-------------+--------------+ | id | user_id | retailer_id | amount_spent | +----+---------+-------------+--------------+ | 4 | 1 | 3 | 10 | | 5 | 1 | 3 | 40 | | 6 | 2 | 1 | 20 | +----+---------+-------------+--------------+
If necessary, we can further refine this as follows
SELECT a.user_id , a.retailer_id FROM my_table a LEFT JOIN ( SELECT y.* FROM my_table x JOIN my_table y ON y.id <> x.id AND y.user_id = x.user_id AND y.retailer_id = x.retailer_id AND y.amount_spent BETWEEN x.amount_spent * 0.3 AND x.amount_spent * 1.3 ) b ON b.id = a.id WHERE b.id IS NULL GROUP BY a.user_id , a.retailer_id HAVING COUNT(*) > 1; +---------+-------------+ | user_id | retailer_id | +---------+-------------+ | 1 | 3 | +---------+-------------+