How to split time intervals by day in PostgreSQL

I have a query that returns some device state information with device id and timestamp. I am trying to do some daily reports and for that I need to know what state devices were in during the day. So, for example, my query might get me something like this

device id     start              end                state
---------------------------------------------------------
1             2017-01-01 13:38   2017-01-03 12:47   1
2             2017-01-01 03:15   2017-01-02 11:04   1
... more records for devices including devices 1 and 2 ...

what I want to end up with is

device id     start              end                state
---------------------------------------------------------
1             2017-01-01 13:38   2017-01-01 23:59   1
1             2017-01-02 00:00   2017-01-02 23:59   1
1             2017-01-03 00:00   2017-01-03 12:47   1
2             2017-01-01 03:15   2017-01-01 23:59   1
2             2017-01-02 00:00   2017-01-02 11:04   1

What I tried, is something like this

select
    l.device_id,
    gs.ts as "day",
    case when l.start < gs.ts then gs.ts else l.start end as start,
    case when l.end > gs.ts + '1 day'::interval then gs.ts + '1 day'::interval else l.end end as end,
    l.state
from ( ... my query goes here ... ) as l
right join
    (select generate_series(
        date 'start date',
        date 'end date',
        '1 day'::interval)) as gs(ts)
    on ((gs.ts, gs.ts + '1 day'::interval) overlaps (l.start, l.end))
order by l.device_id, l.start

Essentially, I right join a sequence of days using overlap function, so every interval that overlaps with that day generates a row and then I cut intervals at the day borders.

Then I use this as a nested select for my daily calculations.

The problem with this approach is that the right join generates a lot of records and the join filter then takes forever. Here is a piece of explain analyze output

->  Nested Loop Left Join  (cost=5371.28..3149290.69 rows=11525332 width=32) (actual time=228.799..32849.000 rows=41197 loops=1)
Join Filter: ... the generate sequence stuff removed for brevity...
Rows Removed by Join Filter: 4994476

As you can see, it generated about 5 million rows, filtered them down to 41K rows and the operation took some 32 seconds.

Is there a more efficient solution to this problem?

Answer

This should be faster than your current approach:

select q.device_id,
       generate_series(start::date, end::date, interval '1 day') as day,
       end as day,
       state
from (your query here) q;

You can use a subquery to get the exact date/times that you want.

Advertisement

Answer