I have a table, with the following structure:
( id SERIAL PRIMARY KEY, user_id integer NOT NULL REFERENCES user(id) ON UPDATE CASCADE, status text NOT NULL, created_at timestamp with time zone NOT NULL, updated_at timestamp with time zone NOT NULL )
Example data:
"id","user_id","status","created_at","updated_at" 416,38,"ONLINE","2018-08-07 14:40:51.813+00","2018-08-07 14:40:51.813+00" 417,39,"ONLINE","2018-08-07 14:45:00.717+00","2018-08-07 14:45:00.717+00" 418,38,"OFFLINE","2018-08-07 15:43:22.678+00","2018-08-07 15:43:22.678+00" 419,38,"ONLINE","2018-08-07 16:21:30.725+00","2018-08-07 16:21:30.725+00" 420,38,"OFFLINE","2018-08-07 16:49:10.3+00","2018-08-07 16:49:10.3+00" 421,38,"ONLINE","2018-08-08 11:37:53.639+00","2018-08-08 11:37:53.639+00" 422,38,"OFFLINE","2018-08-08 12:29:08.234+00","2018-08-08 12:29:08.234+00" 423,39,"ONLINE","2018-08-14 15:22:00.539+00","2018-08-14 15:22:00.539+00" 424,39,"OFFLINE","2018-08-14 15:22:02.092+00","2018-08-14 15:22:02.092+00"
When a user on my application goes online, a new row is inserted with status ONLINE
. When they go offline, a row with status OFFLINE
is inserted. There are other entries created to record different events, but for this query only OFFLINE
and ONLINE
are important.
I want to produce a chart, showing the total number of users online over a time period (e.g 5 minutes), within a date range. If a user is online for any part of that period they should be counted.
Example:
datetime, count 2019-05-22T12:00:00+0000, 53 2019-05-22T12:05:00+0000, 47 2019-05-22T12:10:00+0000, 49 2019-05-22T12:15:00+0000, 55 2019-05-22T12:20:00+0000, 59 2019-05-22T12:25:00+0000, 56
I’m able to produce a similar chart for an individual user by fetching all status rows within the date range then processing manually, however this approach won’t scale to all users.
I believe something like this could be accomplished with window functions, but I’m not really sure where to start
Advertisement
Answer
As your question is very vague nobody realy can help you to 100%. Well, you can achive what you want maybe with a combination of of “with” clauses and window functions. With the “with” clause you can easily break down big problems in small parts. Maybe following query (not looking at any performace) may help, you replace public.tbl_test with your table:
with temp_online as ( select * from public.tbl_test where public.tbl_test.status ilike 'online' order by created_at ), temp_offline as ( select * from public.tbl_test where public.tbl_test.status ilike 'offline' order by created_at ), temp_change as ( select * , ( select temp_offline.created_at from temp_offline where temp_offline.created_at > temp_online.created_at and temp_offline.user_id = temp_online.user_id order by created_at asc limit 1 ) as go_offline from temp_online ), temp_result as ( select *, go_offline - created_at as online_duration from temp_change ), temp_series as ( SELECT (generate_series || ' minute')::interval + '2019-05-22 00:00:00'::timestamp as temp_date FROM generate_series(0, 1440,5) ) select temp_series.temp_date, (select count(*) from temp_result where temp_result.created_at <= temp_series.temp_date and temp_result.go_offline >= temp_series.temp_date) as count_users from temp_series