Why does the following join increase the query time significantly?

Question

I have a star schema here and I am querying the fact table and would like to join one very small dimension table. I can&#8217;t really explain the following: EXPLAIN ANALYZE SELECT COUNT(impression_id), &#8230;

Accepted Answer

Rewritten with (recommended) explicit ANSI JOIN syntax:SELECT COUNT(impression_id), imp.os_id, os.os_desc FROM   bi.impressions impJOIN   bi.os_desc os ON os.os_id = imp.os_idGROUP  BY imp.os_id, os.os_desc;First of all, your second query might be wrong, if more or less than exactly one match are found in os_desc for every row in impressions.This can be ruled out if you have a foreign key constraint on os_id in place, that guarantees referential integrity, plus a NOT NULL constraint on bi.impressions.os_id. If so, in a first step, simplify to:SELECT COUNT(*) AS ct, imp.os_id, os.os_desc FROM   bi.impressions impJOIN   bi.os_desc     os USING (os_id)GROUP  BY imp.os_id, os.os_desc;count(*) is faster than count(column) and equivalent here if the column is NOT NULL. And add a column alias for the count.Faster, yet:SELECT os_id, os.os_desc, sub.ctFROM  (   SELECT os_id, COUNT(*) AS ct   FROM   bi.impressions   GROUP  BY 1   ) subJOIN   bi.os_desc os USING (os_id)Aggregate first, join later. More here:Aggregate a single column in query with many columns  PostgreSQL &#8211; order by an array

Advertisement

Answer