I am following a tutorial on how to find recurring transactions in a bank statement dataset. I have all the data needed, but I have issues getting the queries to work with MySQL. Any idea on how to convert this to MySQL?
WITH transactions_with_date_diff AS ( SELECT ROW_NUMBER() OVER(PARTITION BY description ORDER BY accounting_date), accounting_date - LAG(accounting_date) OVER(PARTITION BY description ORDER BY accounting_date) AS date_diff, LAST_VALUE(amount) OVER(PARTITION BY description ORDER BY accounting_date) AS latest_amount, * FROM transactions ) SELECT description, COUNT(*) AS transactions_count, MIN(accounting_date) AS subscription_started, MAX(accounting_date) AS latest_transaction, SUM(amount) AS total_amount FROM transactions_with_date_diff WHERE date_diff IS NOT NULL AND date_diff BETWEEN 25 AND 35 GROUP BY 1 HAVING COUNT(*) > 1 ORDER BY 2 DESC
The error is:
Query 1 ERROR: You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near '* FROM transactions ) SELECT description, COUNT(*) AS trans' at line 6
Update
I adjusted the SQL query based on feedback, and providing example data. Now I’m getting a different error message.
Query:
WITH transactions_with_date_diff AS ( SELECT *, ROW_NUMBER() OVER(PARTITION BY description ORDER BY accounting_date), accounting_date - LAG(accounting_date) OVER(PARTITION BY description ORDER BY accounting_date) AS date_diff, LAST_VALUE(amount) OVER(PARTITION BY description ORDER BY accounting_date) AS latest_amount FROM transactions ) SELECT description, COUNT(*) AS transactions_count, MIN(accounting_date) AS subscription_started, MAX(accounting_date) AS latest_transaction, SUM(amount) AS total_amount FROM transactions_with_date_diff WHERE date_diff IS NOT NULL AND date_diff BETWEEN 25 AND 35 GROUP BY 1 HAVING COUNT(*) > 1 ORDER BY 2 DESC;
Returning the following error:
Query 1 ERROR: Can't group on 'transactions_count'
Sample table data:
id | accounting_date | description | amount |
---|---|---|---|
1 | 2020-12-31 | APPLE.COM/BILL | -24.03 |
2 | 2021-01-05 | ALIEXPRESS.COM ALIEXPRESS | -33 |
3 | 2021-01-11 | MICROSOFT*XBOX | -399.60 |
Advertisement
Answer
This query should run in both MySQL and Postgres:
WITH transactions_with_date_diff AS ( SELECT t.*, ( t.accounting_date - LAG(t.accounting_date) OVER (PARTITION BY t.description ORDER BY t.accounting_date) ) AS date_diff, LAST_VALUE(t.amount) OVER (PARTITION BY t.description ORDER BY t.accounting_date) AS latest_amount FROM transactions t ) SELECT tdd.description, COUNT(*) AS transactions_count, MIN(tdd.accounting_date) AS subscription_started, MAX(tdd.accounting_date) AS latest_transaction, SUM(tdd.amount) AS total_amount FROM transactions_with_date_diff tdd WHERE tdd.date_diff BETWEEN 25 AND 35 GROUP BY tdd.description HAVING COUNT(*) > 1 ORDER BY transactions_count DESC;
This is, in fact, Standard SQL and should run in just about any database (assuming the functionality is supported. Note the changes:
- No unnamed columns in the CTE. I just removed the
ROW_NUMBER()
. - All table references have aliases
- The
GROUP BY
andORDER BY
clauses does not use positional notation. - The
NOT NULL
comparison is redundant. TheBETWEEN
does not returnTRUE
forNULL
values.