Handling of generate_series() in queries with date or timestamp with / without time zone

Question

I have a query to generate a report based on a date series that is grouped by date and employee_id. The date should be based on a particular time zone, in this case &#8216;Asia/Kuala_Lumpur&#8217;. But this can change depending on where the user&#8217;s time zone is. Definition and sample data for table atten…

Accepted Answer

DB designConsider some modifications to your setup:CREATE TABLE employee ( id int PRIMARY KEY -- !, name text -- do NOT use char(n) !, division_id int);CREATE TABLE attendance ( id int PRIMARY KEY --!, employee_id int NOT NULL REFERENCES employee -- FK!, activity_type int, created_at timestamptz NOT NULL);Defining a PK makes it easier to aggregate rows, because the PK covers the whole row in the GROUP BY clause. See:Why can’t I exclude dependent columns from `GROUP BY` when I aggregate by a key?I wouldn’t use “name” as column name. It’s not descriptive. Every other column could be named “name”. Consider:Any downsides of using data type “text” for storing strings?How to implement a many-to-many relationship in PostgreSQL?QuerySELECT *FROM ( -- complete employee/date grid for division in range SELECT g.d::date AS the_date, id AS employee_id, name, division_id FROM ( SELECT generate_series(MIN(created_at) AT TIME ZONE 'Asia/Kuala_Lumpur' , MAX(created_at) AT TIME ZONE 'Asia/Kuala_Lumpur' , interval '1 day') FROM attendance ) g(d) CROSS JOIN employee e WHERE e.division_id = 1 ) deLEFT JOIN ( -- checkins & checkouts per employee/date for division in range SELECT employee_id, ts::date AS the_date , array_agg(id) as rows , min(ts) FILTER (WHERE activity_type = 1) AS min_check_in , max(ts) FILTER (WHERE activity_type = 2) AS max_check_out , array_agg(ts::time) FILTER (WHERE activity_type = 1) AS check_ins , array_agg(ts::time) FILTER (WHERE activity_type = 2) AS check_outs FROM ( SELECT a.id, a.employee_id, a.activity_type, a.created_at AT TIME ZONE 'Asia/Kuala_Lumpur' AS ts -- convert to timestamp FROM employee e JOIN attendance a ON a.employee_id = e.id -- WHERE a.created_at >= timestamp '2020-11-20' AT TIME ZONE 'Asia/Kuala_Lumpur' -- "sargable" expressions -- AND a.created_at < timestamp '2020-11-21' AT TIME ZONE 'Asia/Kuala_Lumpur' -- exclusive upper bound (includes all of 2020-11-20); AND e.division_id = 1 ORDER BY a.employee_id, a.created_at, a.activity_type -- optional to guarantee sorted arrays ) sub GROUP BY 1, 2 ) a USING (the_date, employee_id)ORDER BY 1, 2;db<>fiddle hereNote that my query outputs local date and time for Asia/Kuala_Lumpur:test=> SELECT timestamptz '2020-11-20 08:52:01 +0' AT TIME ZONE 'Asia/Kuala_Lumpur' AS local_ts; local_ts --------------------- 2020-11-20 16:52:01Where to start? You need to understand the concepts of time zones and the Postgres data types timestamp with time zone (timestamptz) vs. timestamp without time zone (timestamp). Else, it will be confusion without end. Start here:Ignoring time zones altogether in Rails and PostgreSQLMost notably, timestamptz does not store a time zone:Time zone storage in data type “timestamp with time zone”When simply casting timestamptz to date or timestamp, the current time zone setting of the session is assumed. Not what you want. Provide a time zone explicitly with the AT TIME ZONE construct to avoid this pifall. In your fiddle you have both: ... , generate_series( startdate::timestamp AT TIME ZONE 'Asia/Kuala_Lumpur', enddate::timestamp AT TIME ZONE 'Asia/Kuala_Lumpur', interval '1 day') g(d) ...Also not doing what you want. After the (faulty!) cast to timestamp, the AT TIME ZONE construct translates the values back to timestamptz.Also, your query generates the complete Cartesian Product of all users and and the maximum range of days in the the table attendance, only to reduce it back to a single day with: where created_at >= timestamp '2020-11-20' AT TIME ZONE 'Asia/Kuala_Lumpur' and created_at < timestamp '2020-11-20' AT TIME ZONE 'Asia/Kuala_Lumpur' + interval '1 day'The WHERE clause finally does what it’s supposed to do. But it makes no sense to first generate the full range of days, only to throw away most of it. (Seems you copied that from my other fiddle in the meantime?)I commented out the WHERE clause and kept an optimized version of your generate_series() in my query as proof of concept. Further reading:Generating time series between two dates in PostgreSQL

Advertisement

Answer

DB design

Query