Skip to content
Advertisement

Is this simple SQL query correct?

The query below is pretty self-explanatory, and although I’m not good at SQL, I can’t find anything wrong with it. However, the number it yields in not in accordance with my gut feeling and I would like it double-checked, if this is appropriate for StackOverflow.

I’m simply trying to get the number of users that joined my website in 2020, and also made a payment in 2020. I’m trying to figure out “new revenue”.

This is the query:

SELECT Count(DISTINCT( auth_user.id )) AS "2020" 
FROM   auth_user 
       JOIN subscription_transaction 
         ON ( subscription_transaction.event = 'one-time payment' 
              AND subscription_transaction.user_id = auth_user.id 
              AND subscription_transaction.timestamp >= '2020-01-01' 
              AND subscription_transaction.timestamp <= '2020-12-31' ) 
WHERE  auth_user.date_joined >= '2020-01-01' 
       AND auth_user.date_joined <= '2020-12-31'; 

I use PostgreSQL 10.

Thanks in advance!

Advertisement

Answer

I would write the query using EXISTS to get rid of the COUNT(DISTINCT):

SELECT count(*) AS "2020" 
FROM  auth_user au
WHERE au.date_joined >= '2020-01-01' AND
      au.date_joined < '2021-01-01' AND
      EXISTS (SELECT 1
              FROM subscription_transaction st
              WHERE st.event = 'one-time payment' AND
                    st.user_id = au.id AND
                    st.timestamp >= '2020-01-01' AND
                    st.timestamp < '2021-01-01'
             ) ;

This should be faster than your version. However, the results should be the same.

User contributions licensed under: CC BY-SA
8 People found this is helpful
Advertisement