Skip to content
Advertisement

Postgres: how to check for records in time buckets for multiple values

I have a POSTGRES table with roughly the following shape:

sensor_id |         recorded_at
----------+--------------------
        A | 2020-01-01 00:00:00
        A | 2020-01-01 00:03:00
        B | 2020-01-01 01:00:00
        C | 2020-01-01 01:03:00
      ... |                 ...

Given a

  • Start time
  • End time
  • Bucket width
  • List of sensor ids

I’d like to write a query that slices the range [start_time, end_time] into sub-intervals (buckets) of width bucket_width and, for each bucket, check if each of the sensors in the list had a record with recorded_at during that bucket.

So for example, assuming inputs of

  • Start time = '2020-01-01 00:00:00'
  • End time = '2020-01-01 02:00:00'
  • Bucket width = '1 hour'
  • List of sensor ids = ['A', 'B', 'C']

the query should return something like

                  t1 |                  t2 | A_count | B_count | C_count
---------------------+---------------------+---------+---------+--------
 2020-01-01 00:00:00 | 2020-01-01 01:00:00 |       2 |       0 |       0
 2020-01-01 01:00:00 | 2020-01-01 02:00:00 |       0 |       1 |       1

I don’t need the actual count, so I imagine LIMIT 1 will show up somewhere. I just included it in the example to help make it clear what I’m looking for.

Advertisement

Answer

One option uses generate_series() to generate the rows, then a left join to bring the table, and finally conditional aggregation to get the count by sensor:

select s.ts ts1, s.ts + interval '1 hour' ts2,
    count(*) filter (where sensor_id = 'A') a_count,
    count(*) filter (where sensor_id = 'B') b_count,
    count(*) filter (where sensor_id = 'C') c_count
from generate_series('2020-01-01 00:00:00'::timestamp, '2020-01-01 02:00:00'::timestamp, '1 hour') s(ts)
left join mytable t on t.recorded_at >= s.ts and t.recorded_at < s.ts + interval '1 hour'
group by s.ts

Note that this would generate an extra record (from 02:00:00 to 03:00:00') as compared to your desired results. If you want to avoid that, you can slightly modify the generate_series()` arguments, like:

...
    from generate_series(
        '2020-01-01 00:00:00'::timestamp, 
        '2020-01-01 02:00:00'::timestamp - interval '1 second', 
        '1 hour'
     ) s(ts)
...
User contributions licensed under: CC BY-SA
6 People found this is helpful
Advertisement