I’m searching for a more efficient method to accomplish something that I’ve already solved with a few different SQL statements.
The problem:
I have two tables
- a transactions table, and
- an accounts table
The transactions
table has columns that look like:
acct_sending | acct_receiving | amount | tx_datetime |
---|---|---|---|
100 | 101 | 10 | yyyy-mm-dd hh-mm-ss |
101 | 100 | 5 | yyyy-mm-dd hh-mm-ss |
101 | 200 | 1 | yyyy-mm-dd hh-mm-ss |
200 | 101 | 11 | yyyy-mm-dd hh-mm-ss |
200 | 234 | 22 | yyyy-mm-dd hh-mm-ss |
234 | 567 | 24 | yyyy-mm-dd hh-mm-ss |
567 | 890 | 56 | yyyy-mm-dd hh-mm-ss |
890 | 100 | 73 | yyyy-mm-dd hh-mm-ss |
The accounts
table has columns that look like:
account | balance | last_tx |
---|---|---|
100 | 10 | yyyy-mm-dd hh-mm-ss |
101 | 100 | yyyy-mm-dd hh-mm-ss |
102 | 100 | yyyy-mm-dd hh-mm-ss |
200 | 1000 | yyyy-mm-dd hh-mm-ss |
234 | 10000 | yyyy-mm-dd hh-mm-ss |
567 | 1000 | yyyy-mm-dd hh-mm-ss |
890 | 100 | yyyy-mm-dd hh-mm-ss |
I want to create a query that returns a list of transactions where both acct_sending
and acct_receiving
are in the accounts
table AND have a balance
greater than some value. Bonus points if the query result has a count
column that holds the total number of transactions between those two accounts.
Given the transactions
and accounts
tables above, if we ran this query with balance > 10
then the result would be:
acct_sending | acct_receiving | count |
---|---|---|
101 | 200 | 2 |
200 | 234 | 1 |
234 | 567 | 1 |
567 | 890 | 1 |
—
My Solution
First, create a temp table with transactions where acct_sending = account
and account > 10
CREATE TEMP TABLE temp_sending AS SELECT acct_sending, acct_receiving FROM transactions t WHERE EXISTS (SELECT account FROM accounts a WHERE t.acct_sending = a.account AND a.balance > 10)
Then, create a new temp table using the last temp table where acct_receiving = account
and account > 10
CREATE TEMP TABLE temp_sending_receiving AS SELECT acct_sending, acct_receiving FROM temp_sending t WHERE EXISTS (SELECT account FROM accounts a WHERE t.acct_sending = a.account AND a.balance > 10)
Last, I query temp_sending_receiving
to get a list of unique transactions, and generate the count
column.
SELECT acct_sending, account_receiving, count(*) FROM ( SELECT CASE WHEN sender < receiver THEN sender ELSE receiver END AS sender, CASE WHEN sender < receiver THEN receiver ELSE sender END AS receiver FROM temp_sending_receiving ) AS x GROUP BY acct_sending, account_receiving
Running each of these queries gives me the results I want, but …
Are there better / more efficient ways to do this?
I’m thinking in both query time, and memory efficiency. Thanks!!!
—
Note
I’m running these SQL queries as scripts in DBeaver and Python so I added those as tags. If that’s wrong, LMK! Thanks. 🙂
Advertisement
Answer
You must join transactions
to 2 copies of accounts
and aggregate:
SELECT MIN(t.acct_sending, t.acct_receiving) sender, MAX(t.acct_sending, t.acct_receiving) receiver, COUNT(*) count FROM transactions t INNER JOIN accounts a1 ON a1.account = t.acct_sending AND a1.balance > 10 INNER JOIN accounts a2 ON a2.account = t.acct_receiving AND a2.balance > 10 GROUP BY sender, receiver;
Or, with a CTE
:
WITH cte AS (SELECT * FROM accounts WHERE balance > 10) SELECT MIN(t.acct_sending, t.acct_receiving) sender, MAX(t.acct_sending, t.acct_receiving) receiver, COUNT(*) count FROM transactions t INNER JOIN cte c1 ON c1.account = t.acct_sending INNER JOIN cte c2 ON c2.account = t.acct_receiving GROUP BY sender, receiver;
See the demo.