Skip to content
Advertisement

Ensuring no dupe ids in query return

So for the following schema:

You can see that there are 3 activities and each contact has two. What i am searching for is the number of activities per account in the previous two months. So I have the following query

This returns:

However this is incorrect. There are only 3 activities, its just that each contact sees activity 1. so it is counting that activity twice. Is there a way for me to only count each activity id one time so there is no duplication?

Advertisement

Answer

count(DISTINCT activity_id) to fold duplicates in the count, like Edouard suggested.
But there is more:

db<>fiddle here

  • Most importantly, add an outer WHERE clause to the query to filter irrelevant rows early. This can make a big difference for a small selection from a big table.
    We have to move that predicate to the JOIN clause, lest we’d exclude accounts with no activity. (LEFT JOIN and RIGHT JOIN can both be used, mirroring each other.) See:

  • Make that filter “sargable”, so it can use an index on (start_date) (unlike your original formulation). Again, big impact for a small selection from a big table.

  • Use the same expressions for your aggregate filter clauses. Lesser effect, but take it.
    Unlike other aggregate functions, count() returns 0 (not NULL) for “no rows”, so we don’t have to do anything extra.

  • Assuming referential integrity (enforced with a FK constraint), the join to table account is just expensive noise. Drop it. CURRENT_DATE is not wrong. But since your expressions yield timestamp anyway, it’s bit more efficient to use LOCALTIMESTAMP to begin with.

Compare with your original to see that this is quite a bit faster.

And I assume you are aware that this query introduces a dependency on the TimeZone setting of the executing session. The current date depends on where in the world we ask. See:


If you are not bound to this particular output format, a pivoted form is simpler, now that we filter rows early:

Again, we can include accounts without activity. But months without activity do not show up …

User contributions licensed under: CC BY-SA
5 People found this is helpful
Advertisement