Assume I have the following table structure and data:
x
+------------------+-------------------------+--------+
| transaction_date | transaction_description | amount |
+------------------+-------------------------+--------+
| 2020-08-20 | Burger King | 10.06 |
| 2020-08-23 | Burger King | 10.06 |
| 2020-08-29 | McDonalds | 6.48 |
| 2020-09-04 | Wendy's | 7.45 |
| 2020-09-05 | Dairy Queen | 14.36 |
| 2020-09-06 | Wendy's | 7.45 |
| 2020-09-13 | Burger King | 10.06 |
+------------------+-------------------------+--------+
I’d like to be able to find duplicate transactions where the description and amounts match, but the date would have some degree of variance +/- 3 days from each other.
Because the “Burger King” transactions are within three days of each other (2020-08-20 and 2020-08-23), they would be counted as duplicates, but the entry on 2020-09-13 would not be.
I have the following query so far, but the degree of variance piece is what’s stumping me.
SELECT t.transaction_date, t.transaction_description, t.amount
FROM transactions t
JOIN (SELECT transaction_date, transaction_description, amount, COUNT(*)
FROM transactions
GROUP BY transaction_description, amount
HAVING count(*) > 1 ) b
ON t.transaction_description = b.transaction_description
AND t.amount = b.amount
ORDER BY t.amount ASC;
Ideally, I’d love for the output to be something along the lines of:
+------------------+-------------------------+--------+
| transaction_date | transaction_description | amount |
+------------------+-------------------------+--------+
| 2020-08-20 | Burger King | 10.06 |
| 2020-08-23 | Burger King | 10.06 |
| 2020-09-04 | Wendy's | 7.45 |
| 2020-09-06 | Wendy's | 7.45 |
+------------------+-------------------------+--------+
Am I way off? Or is this even possible? Thanks in advance.
Advertisement
Answer
You can use exists
:
select t.*
from mytable t
where exists (
select 1
from mytable t1
where
t1.transaction_description = t.transaction_description
and t1.transaction_date <> t.transaction_date
and t1.transaction_date >= t. transaction_date - interval 3 day
and t1.transaction_date <= t. transaction_date + interval 3 day
If you are running MySQL 8.0, a count within a window date range is a reasonable alternative:
select t.*
from (
select t.*,
count(*) over(
partition by transaction_description
order by transaction_date
range between interval 3 day preceding and interval 3 day following
) cnt
from mytable t
) t
where cnt > 1