Skip to content
Advertisement

Get totals from difference between rows

I have a table, with the following structure:

(
    id SERIAL PRIMARY KEY,
    user_id integer NOT NULL REFERENCES user(id) ON UPDATE CASCADE,
    status text NOT NULL,
    created_at timestamp with time zone NOT NULL,
    updated_at timestamp with time zone NOT NULL
)

Example data:

"id","user_id","status","created_at","updated_at"
416,38,"ONLINE","2018-08-07 14:40:51.813+00","2018-08-07 14:40:51.813+00"
417,39,"ONLINE","2018-08-07 14:45:00.717+00","2018-08-07 14:45:00.717+00"
418,38,"OFFLINE","2018-08-07 15:43:22.678+00","2018-08-07 15:43:22.678+00"
419,38,"ONLINE","2018-08-07 16:21:30.725+00","2018-08-07 16:21:30.725+00"
420,38,"OFFLINE","2018-08-07 16:49:10.3+00","2018-08-07 16:49:10.3+00"
421,38,"ONLINE","2018-08-08 11:37:53.639+00","2018-08-08 11:37:53.639+00"
422,38,"OFFLINE","2018-08-08 12:29:08.234+00","2018-08-08 12:29:08.234+00"
423,39,"ONLINE","2018-08-14 15:22:00.539+00","2018-08-14 15:22:00.539+00"
424,39,"OFFLINE","2018-08-14 15:22:02.092+00","2018-08-14 15:22:02.092+00"

When a user on my application goes online, a new row is inserted with status ONLINE. When they go offline, a row with status OFFLINE is inserted. There are other entries created to record different events, but for this query only OFFLINE and ONLINE are important.

I want to produce a chart, showing the total number of users online over a time period (e.g 5 minutes), within a date range. If a user is online for any part of that period they should be counted.

Example:

datetime, count
2019-05-22T12:00:00+0000, 53
2019-05-22T12:05:00+0000, 47
2019-05-22T12:10:00+0000, 49
2019-05-22T12:15:00+0000, 55
2019-05-22T12:20:00+0000, 59
2019-05-22T12:25:00+0000, 56

I’m able to produce a similar chart for an individual user by fetching all status rows within the date range then processing manually, however this approach won’t scale to all users.

I believe something like this could be accomplished with window functions, but I’m not really sure where to start

Advertisement

Answer

As your question is very vague nobody realy can help you to 100%. Well, you can achive what you want maybe with a combination of of “with” clauses and window functions. With the “with” clause you can easily break down big problems in small parts. Maybe following query (not looking at any performace) may help, you replace public.tbl_test with your table:

with temp_online as (
    select
    *
    from public.tbl_test
    where public.tbl_test.status ilike 'online'
    order by created_at
),

temp_offline as (
    select
    *
    from public.tbl_test
    where public.tbl_test.status ilike 'offline'
    order by created_at
),

temp_change as (
    select
    * ,
    (
        select temp_offline.created_at from temp_offline where temp_offline.created_at > temp_online.created_at  and temp_offline.user_id = temp_online.user_id order by created_at asc limit 1
    ) as go_offline
    from temp_online
),

temp_result as 
(
select *,
go_offline - created_at as online_duration
from temp_change
),

temp_series as 
(
SELECT (generate_series || ' minute')::interval + '2019-05-22 00:00:00'::timestamp  as temp_date
    FROM generate_series(0, 1440,5)
)

select
 temp_series.temp_date,
(select count(*) from temp_result where temp_result.created_at <=  temp_series.temp_date and temp_result.go_offline >= temp_series.temp_date) as count_users
from 
temp_series 
User contributions licensed under: CC BY-SA
4 People found this is helpful
Advertisement