The query below is pretty self-explanatory, and although I’m not good at SQL, I can’t find anything wrong with it. However, the number it yields in not in accordance with my gut feeling and I would like it double-checked, if this is appropriate for StackOverflow.
I’m simply trying to get the number of users that joined my website in 2020, and also made a payment in 2020. I’m trying to figure out “new revenue”.
This is the query:
SELECT Count(DISTINCT( auth_user.id )) AS "2020" FROM auth_user JOIN subscription_transaction ON ( subscription_transaction.event = 'one-time payment' AND subscription_transaction.user_id = auth_user.id AND subscription_transaction.timestamp >= '2020-01-01' AND subscription_transaction.timestamp <= '2020-12-31' ) WHERE auth_user.date_joined >= '2020-01-01' AND auth_user.date_joined <= '2020-12-31';
I use PostgreSQL 10.
Thanks in advance!
Advertisement
Answer
I would write the query using EXISTS
to get rid of the COUNT(DISTINCT)
:
SELECT count(*) AS "2020" FROM auth_user au WHERE au.date_joined >= '2020-01-01' AND au.date_joined < '2021-01-01' AND EXISTS (SELECT 1 FROM subscription_transaction st WHERE st.event = 'one-time payment' AND st.user_id = au.id AND st.timestamp >= '2020-01-01' AND st.timestamp < '2021-01-01' ) ;
This should be faster than your version. However, the results should be the same.