Assume I have the following table structure and data:
+------------------+-------------------------+--------+ | transaction_date | transaction_description | amount | +------------------+-------------------------+--------+ | 2020-08-20 | Burger King | 10.06 | | 2020-08-23 | Burger King | 10.06 | | 2020-08-29 | McDonalds | 6.48 | | 2020-09-04 | Wendy's | 7.45 | | 2020-09-05 | Dairy Queen | 14.36 | | 2020-09-06 | Wendy's | 7.45 | | 2020-09-13 | Burger King | 10.06 | +------------------+-------------------------+--------+
I’d like to be able to find duplicate transactions where the description and amounts match, but the date would have some degree of variance +/- 3 days from each other.
Because the “Burger King” transactions are within three days of each other (2020-08-20 and 2020-08-23), they would be counted as duplicates, but the entry on 2020-09-13 would not be.
I have the following query so far, but the degree of variance piece is what’s stumping me.
SELECT t.transaction_date, t.transaction_description, t.amount FROM transactions t JOIN (SELECT transaction_date, transaction_description, amount, COUNT(*) FROM transactions GROUP BY transaction_description, amount HAVING count(*) > 1 ) b ON t.transaction_description = b.transaction_description AND t.amount = b.amount ORDER BY t.amount ASC;
Ideally, I’d love for the output to be something along the lines of:
+------------------+-------------------------+--------+ | transaction_date | transaction_description | amount | +------------------+-------------------------+--------+ | 2020-08-20 | Burger King | 10.06 | | 2020-08-23 | Burger King | 10.06 | | 2020-09-04 | Wendy's | 7.45 | | 2020-09-06 | Wendy's | 7.45 | +------------------+-------------------------+--------+
Am I way off? Or is this even possible? Thanks in advance.
Advertisement
Answer
You can use exists
:
select t.* from mytable t where exists ( select 1 from mytable t1 where t1.transaction_description = t.transaction_description and t1.transaction_date <> t.transaction_date and t1.transaction_date >= t. transaction_date - interval 3 day and t1.transaction_date <= t. transaction_date + interval 3 day
If you are running MySQL 8.0, a count within a window date range is a reasonable alternative:
select t.* from ( select t.*, count(*) over( partition by transaction_description order by transaction_date range between interval 3 day preceding and interval 3 day following ) cnt from mytable t ) t where cnt > 1