I am trying to compare values for each user per each retailer transaction. This is the input example table:
user_id|retailer_id|amount_spent
1 |2 |30
1 |2 |10
1 |2 |28
Now, I want to compare for each distinct user for the same retailer if the amount spent was within 30% across all purchases. Let’s say that the amount spent of the first and second transaction is 67% apart ($30 and $10) which is greater than the 30% threshold. However, the third row with spent of 28$ is within 30% of variance compared to the 30$ from the first row. So, these two transactions will fulfill the criteria, i.e. comparison of row 1 and row 3.
Current query:
select distinct a.customer_id, a.purchase_date
from
(
select
customer_id,
retailer,
purchase_date,
purchase_amount,
Lag(purchase_amount) over (partition by customer_id,retailer) as previous_amt
from tbl
)a
where abs(a.purchase_amount-a.previous_amt)/a.purchase_amount <=0.3
Outout would give me null rows because it is comparing sequential transaction amounts. However, it is not accounting that row 1 and row 3 fulfill the criteria and therefore, it would return those 2 rows.
How do I adjust my query from here?
Advertisement
Answer
Consider the following…
DROP TABLE IF EXISTS my_table;
CREATE TABLE my_table
(id SERIAL PRIMARY KEY
,user_id INT NOT NULL
,retailer_id INT NOT NULL
,amount_spent INT NOT NULL
);
INSERT INTO my_table (user_id,retailer_id,amount_spent) VALUES
(1,2,30),
(1,2,10),
(1,2,28),
(1,3,10),
(1,3,40),
(2,1,20);
The following query will show us all rows with no other (user_id,retailer) combination within 30% of another one (my arithmetic or logic might be slighhtly skewed but hopefuly you get the idea)…
SELECT a.*
FROM my_table a
LEFT
JOIN
( SELECT y.*
FROM my_table x
JOIN my_table y
ON y.id <> x.id
AND y.user_id = x.user_id
AND y.retailer_id = x.retailer_id
AND y.amount_spent BETWEEN x.amount_spent * 0.3 AND x.amount_spent * 1.3
) b
ON b.id = a.id
WHERE b.id IS NULL;
+----+---------+-------------+--------------+
| id | user_id | retailer_id | amount_spent |
+----+---------+-------------+--------------+
| 4 | 1 | 3 | 10 |
| 5 | 1 | 3 | 40 |
| 6 | 2 | 1 | 20 |
+----+---------+-------------+--------------+
If necessary, we can further refine this as follows
SELECT a.user_id
, a.retailer_id
FROM my_table a
LEFT
JOIN
( SELECT y.*
FROM my_table x
JOIN my_table y
ON y.id <> x.id
AND y.user_id = x.user_id
AND y.retailer_id = x.retailer_id
AND y.amount_spent BETWEEN x.amount_spent * 0.3 AND x.amount_spent * 1.3
) b
ON b.id = a.id
WHERE b.id IS NULL
GROUP
BY a.user_id
, a.retailer_id
HAVING COUNT(*) > 1;
+---------+-------------+
| user_id | retailer_id |
+---------+-------------+
| 1 | 3 |
+---------+-------------+