I have a query that returns some device state information with device id and timestamp. I am trying to do some daily reports and for that I need to know what state devices were in during the day. So, for example, my query might get me something like this
device id start end state --------------------------------------------------------- 1 2017-01-01 13:38 2017-01-03 12:47 1 2 2017-01-01 03:15 2017-01-02 11:04 1 ... more records for devices including devices 1 and 2 ...
what I want to end up with is
device id start end state --------------------------------------------------------- 1 2017-01-01 13:38 2017-01-01 23:59 1 1 2017-01-02 00:00 2017-01-02 23:59 1 1 2017-01-03 00:00 2017-01-03 12:47 1 2 2017-01-01 03:15 2017-01-01 23:59 1 2 2017-01-02 00:00 2017-01-02 11:04 1
What I tried, is something like this
select
    l.device_id,
    gs.ts as "day",
    case when l.start < gs.ts then gs.ts else l.start end as start,
    case when l.end > gs.ts + '1 day'::interval then gs.ts + '1 day'::interval else l.end end as end,
    l.state
from ( ... my query goes here ... ) as l
right join
    (select generate_series(
        date 'start date',
        date 'end date',
        '1 day'::interval)) as gs(ts)
    on ((gs.ts, gs.ts + '1 day'::interval) overlaps (l.start, l.end))
order by l.device_id, l.start
Essentially, I right join a sequence of days using overlap function, so every interval that overlaps with that day generates a row and then I cut intervals at the day borders.
Then I use this as a nested select for my daily calculations.
The problem with this approach is that the right join generates a lot of records and the join filter then takes forever. Here is a piece of explain analyze output
-> Nested Loop Left Join (cost=5371.28..3149290.69 rows=11525332 width=32) (actual time=228.799..32849.000 rows=41197 loops=1) Join Filter: ... the generate sequence stuff removed for brevity... Rows Removed by Join Filter: 4994476
As you can see, it generated about 5 million rows, filtered them down to 41K rows and the operation took some 32 seconds.
Is there a more efficient solution to this problem?
Advertisement
Answer
This should be faster than your current approach:
select q.device_id,
       generate_series(start::date, end::date, interval '1 day') as day,
       end as day,
       state
from (your query here) q;
You can use a subquery to get the exact date/times that you want.