Skip to content
Advertisement

How to compare within a partition across all possible values in MySql?

I am trying to compare values for each user per each retailer transaction. This is the input example table:

user_id|retailer_id|amount_spent
1      |2          |30
1      |2          |10
1      |2          |28

Now, I want to compare for each distinct user for the same retailer if the amount spent was within 30% across all purchases. Let’s say that the amount spent of the first and second transaction is 67% apart ($30 and $10) which is greater than the 30% threshold. However, the third row with spent of 28$ is within 30% of variance compared to the 30$ from the first row. So, these two transactions will fulfill the criteria, i.e. comparison of row 1 and row 3.

Current query:

   select distinct a.customer_id, a.purchase_date 

from 
(
select 
  customer_id,
  retailer,
  purchase_date,
  purchase_amount,
  Lag(purchase_amount) over (partition by customer_id,retailer) as previous_amt

  from tbl
)a 

where abs(a.purchase_amount-a.previous_amt)/a.purchase_amount <=0.3

Outout would give me null rows because it is comparing sequential transaction amounts. However, it is not accounting that row 1 and row 3 fulfill the criteria and therefore, it would return those 2 rows.

How do I adjust my query from here?

Advertisement

Answer

Consider the following…

DROP TABLE IF EXISTS my_table;

CREATE TABLE my_table
(id SERIAL PRIMARY KEY
,user_id INT NOT NULL 
,retailer_id INT NOT NULL
,amount_spent INT NOT NULL
);

INSERT INTO my_table (user_id,retailer_id,amount_spent) VALUES
(1,2,30),
(1,2,10),
(1,2,28),
(1,3,10),
(1,3,40),
(2,1,20);

The following query will show us all rows with no other (user_id,retailer) combination within 30% of another one (my arithmetic or logic might be slighhtly skewed but hopefuly you get the idea)…

SELECT a.*
  FROM my_table a
  LEFT 
  JOIN 
     ( SELECT y.*
         FROM my_table x
         JOIN my_table y 
           ON y.id <> x.id
          AND y.user_id = x.user_id
          AND y.retailer_id = x.retailer_id
          AND y.amount_spent BETWEEN x.amount_spent * 0.3 AND x.amount_spent * 1.3
     ) b
    ON b.id = a.id
 WHERE b.id IS NULL;

   +----+---------+-------------+--------------+
   | id | user_id | retailer_id | amount_spent |
   +----+---------+-------------+--------------+
   |  4 |       1 |           3 |           10 |
   |  5 |       1 |           3 |           40 |
   |  6 |       2 |           1 |           20 |
   +----+---------+-------------+--------------+

If necessary, we can further refine this as follows

 SELECT a.user_id
      , a.retailer_id
   FROM my_table a
   LEFT 
   JOIN 
      ( SELECT y.*
          FROM my_table x
          JOIN my_table y 
            ON y.id <> x.id
           AND y.user_id = x.user_id
           AND y.retailer_id = x.retailer_id
           AND y.amount_spent BETWEEN x.amount_spent * 0.3 AND x.amount_spent * 1.3
      ) b
     ON b.id = a.id
  WHERE b.id IS NULL
  GROUP 
     BY a.user_id
      , a.retailer_id 
 HAVING COUNT(*) > 1;

  +---------+-------------+
  | user_id | retailer_id |
  +---------+-------------+
  |       1 |           3 |
  +---------+-------------+
User contributions licensed under: CC BY-SA
10 People found this is helpful
Advertisement