Assume I have the following table structure and data:
+------------------+-------------------------+--------+ | transaction_date | transaction_description | amount | +------------------+-------------------------+--------+ | 2020-08-20 | Burger King | 10.06 | | 2020-08-23 | Burger King | 10.06 | | 2020-08-29 | McDonalds | 6.48 | | 2020-09-04 | Wendy's | 7.45 | | 2020-09-05 | Dairy Queen | 14.36 | | 2020-09-06 | Wendy's | 7.45 | | 2020-09-13 | Burger King | 10.06 | +------------------+-------------------------+--------+
I’d like to be able to find duplicate transactions where the description and amounts match, but the date would have some degree of variance +/- 3 days from each other.
Because the “Burger King” transactions are within three days of each other (2020-08-20 and 2020-08-23), they would be counted as duplicates, but the entry on 2020-09-13 would not be.
I have the following query so far, but the degree of variance piece is what’s stumping me.
SELECT t.transaction_date, t.transaction_description, t.amount FROM transactions t JOIN (SELECT transaction_date, transaction_description, amount, COUNT(*) FROM transactions GROUP BY transaction_description, amount HAVING count(*) > 1 ) b ON t.transaction_description = b.transaction_description AND t.amount = b.amount ORDER BY t.amount ASC;
Ideally, I’d love for the output to be something along the lines of:
+------------------+-------------------------+--------+ | transaction_date | transaction_description | amount | +------------------+-------------------------+--------+ | 2020-08-20 | Burger King | 10.06 | | 2020-08-23 | Burger King | 10.06 | | 2020-09-04 | Wendy's | 7.45 | | 2020-09-06 | Wendy's | 7.45 | +------------------+-------------------------+--------+
Am I way off? Or is this even possible? Thanks in advance.
Advertisement
Answer
You can use exists:
select t.*
from mytable t
where exists (
select 1
from mytable t1
where
t1.transaction_description = t.transaction_description
and t1.transaction_date <> t.transaction_date
and t1.transaction_date >= t. transaction_date - interval 3 day
and t1.transaction_date <= t. transaction_date + interval 3 day
If you are running MySQL 8.0, a count within a window date range is a reasonable alternative:
select t.*
from (
select t.*,
count(*) over(
partition by transaction_description
order by transaction_date
range between interval 3 day preceding and interval 3 day following
) cnt
from mytable t
) t
where cnt > 1