Skip to content
Advertisement

SQL want efficient query to SELECT columnA, columnB FROM table1 WHERE both columnA, columnB are in columnC FROM table2 WHERE columndD > value

I’m searching for a more efficient method to accomplish something that I’ve already solved with a few different SQL statements.

The problem:

I have two tables

  1. a transactions table, and
  2. an accounts table

The transactions table has columns that look like:

acct_sending acct_receiving amount tx_datetime
100 101 10 yyyy-mm-dd hh-mm-ss
101 100 5 yyyy-mm-dd hh-mm-ss
101 200 1 yyyy-mm-dd hh-mm-ss
200 101 11 yyyy-mm-dd hh-mm-ss
200 234 22 yyyy-mm-dd hh-mm-ss
234 567 24 yyyy-mm-dd hh-mm-ss
567 890 56 yyyy-mm-dd hh-mm-ss
890 100 73 yyyy-mm-dd hh-mm-ss

The accounts table has columns that look like:

account balance last_tx
100 10 yyyy-mm-dd hh-mm-ss
101 100 yyyy-mm-dd hh-mm-ss
102 100 yyyy-mm-dd hh-mm-ss
200 1000 yyyy-mm-dd hh-mm-ss
234 10000 yyyy-mm-dd hh-mm-ss
567 1000 yyyy-mm-dd hh-mm-ss
890 100 yyyy-mm-dd hh-mm-ss

I want to create a query that returns a list of transactions where both acct_sending and acct_receiving are in the accounts table AND have a balance greater than some value. Bonus points if the query result has a count column that holds the total number of transactions between those two accounts.

Given the transactions and accounts tables above, if we ran this query with balance > 10 then the result would be:

acct_sending acct_receiving count
101 200 2
200 234 1
234 567 1
567 890 1

My Solution

First, create a temp table with transactions where acct_sending = account and account > 10

CREATE TEMP TABLE temp_sending AS
SELECT acct_sending, acct_receiving
FROM transactions t
WHERE EXISTS
    (SELECT account
    FROM accounts a
    WHERE t.acct_sending = a.account AND a.balance > 10)

Then, create a new temp table using the last temp table where acct_receiving = account and account > 10

CREATE TEMP TABLE temp_sending_receiving AS
SELECT acct_sending, acct_receiving
FROM temp_sending t
WHERE EXISTS
    (SELECT account
    FROM accounts a
    WHERE t.acct_sending = a.account AND a.balance > 10)

Last, I query temp_sending_receiving to get a list of unique transactions, and generate the count column.

SELECT acct_sending, account_receiving, count(*)
FROM (
    SELECT 
        CASE WHEN sender < receiver THEN sender ELSE receiver END AS sender, 
        CASE WHEN sender < receiver THEN receiver ELSE sender END AS receiver 
    FROM temp_sending_receiving
) AS x
GROUP BY acct_sending, account_receiving

Running each of these queries gives me the results I want, but …

Are there better / more efficient ways to do this?

I’m thinking in both query time, and memory efficiency. Thanks!!!

Note

I’m running these SQL queries as scripts in DBeaver and Python so I added those as tags. If that’s wrong, LMK! Thanks. 🙂

Advertisement

Answer

You must join transactions to 2 copies of accounts and aggregate:

SELECT MIN(t.acct_sending, t.acct_receiving) sender,
       MAX(t.acct_sending, t.acct_receiving) receiver,
       COUNT(*) count
FROM transactions t 
INNER JOIN accounts a1 ON a1.account = t.acct_sending AND a1.balance > 10
INNER JOIN accounts a2 ON a2.account = t.acct_receiving AND a2.balance > 10
GROUP BY sender, receiver;

Or, with a CTE:

WITH cte AS (SELECT * FROM accounts WHERE balance > 10)
SELECT MIN(t.acct_sending, t.acct_receiving) sender,
       MAX(t.acct_sending, t.acct_receiving) receiver,
       COUNT(*) count
FROM transactions t 
INNER JOIN cte c1 ON c1.account = t.acct_sending
INNER JOIN cte c2 ON c2.account = t.acct_receiving
GROUP BY sender, receiver;

See the demo.

User contributions licensed under: CC BY-SA
9 People found this is helpful
Advertisement