I want to count ID’s per month using generate_series()
. This query works in PostgreSQL 9.1:
SELECT (to_char(serie,'yyyy-mm')) AS year, sum(amount)::int AS eintraege FROM ( SELECT COUNT(mytable.id) as amount, generate_series::date as serie FROM mytable RIGHT JOIN generate_series( (SELECT min(date_from) FROM mytable)::date, (SELECT max(date_from) FROM mytable)::date, interval '1 day') ON generate_series = date(date_from) WHERE version = 1 GROUP BY generate_series ) AS foo GROUP BY Year ORDER BY Year ASC;
This is my output:
"2006-12" | 4 "2007-02" | 1 "2007-03" | 1
But what I want to get is this output (‘0’ value in January):
"2006-12" | 4 "2007-01" | 0 "2007-02" | 1 "2007-03" | 1
Months without id
should be listed nevertheless.
Any ideas how to solve this?
Sample data:
drop table if exists mytable; create table mytable(id bigint, version smallint, date_from timestamp); insert into mytable(id, version, date_from) values (4084036, 1, '2006-12-22 22:46:35'), (4084938, 1, '2006-12-23 16:19:13'), (4084938, 2, '2006-12-23 16:20:23'), (4084939, 1, '2006-12-23 16:29:14'), (4084954, 1, '2006-12-23 16:28:28'), (4250653, 1, '2007-02-12 21:58:53'), (4250657, 1, '2007-03-12 21:58:53') ;
Advertisement
Answer
Untangled, simplified and fixed, it might look like this:
SELECT to_char(s.tag,'yyyy-mm') AS monat , count(t.id) AS eintraege FROM ( SELECT generate_series(min(date_from)::date , max(date_from)::date , interval '1 day' )::date AS tag FROM mytable t ) s LEFT JOIN mytable t ON t.date_from::date = s.tag AND t.version = 1 GROUP BY 1 ORDER BY 1;
db<>fiddle here
Among all the noise, misleading identifiers and unconventional format the actual problem was hidden here:
WHERE version = 1
You made correct use of RIGHT [OUTER] JOIN
. But adding a WHERE
clause that requires an existing row from mytable
converts the RIGHT [OUTER] JOIN
to an [INNER] JOIN
effectively.
Move that filter into the JOIN
condition to make it work.
I simplified some other things while being at it.
Better, yet
SELECT to_char(mon, 'yyyy-mm') AS monat , COALESCE(t.ct, 0) AS eintraege FROM ( SELECT date_trunc('month', date_from)::date AS mon , count(*) AS ct FROM mytable WHERE version = 1 GROUP BY 1 ) t RIGHT JOIN ( SELECT generate_series(date_trunc('month', min(date_from)) , max(date_from) , interval '1 mon')::date FROM mytable ) m(mon) USING (mon) ORDER BY mon;
db<>fiddle here
It’s much cheaper to aggregate first and join later – joining one row per month instead of one row per day.
It’s cheaper to base GROUP BY
and ORDER BY
on the date
value instead of the rendered text
.
count(*)
is a bit faster than count(id)
, while equivalent in this query.
generate_series()
is a bit faster and safer when based on timestamp
instead of date
. See: