Skip to content
Advertisement

Getting row samples from PostgreSQL

I have a data table with timestamp and some data columns. The rows are inserted in arbitrary time intervals, i.e. the timestamp difference of consecutive rows is not stable but ranges from several seconds to several hours.

I need to select one row per time interval of constant length, if there is any.

Example

To get the time intervals for selection, I have a function that generates table of time intervals, for example:

start_time              end_time
'2021-05-31 10:00:00'   '2021-05-31 10:10:00'
'2021-05-31 10:10:00'   '2021-05-31 10:20:00'
'2021-05-31 10:20:00'   '2021-05-31 10:30:00'

Then for this source data:

timestamp                data
'2021-05-31 10:01:00'    1
'2021-05-31 10:02:00'    2
'2021-05-31 10:05:00'    3
'2021-05-31 10:21:00'    4

I require result:

timestamp                data
'2021-05-31 10:01:00'    1
'2021-05-31 10:21:00'    4
  • the first row out of three is selected for the first interval
  • no row is selected for the second interval
  • the first and only possible row is selected for the third interval

Thanks

Advertisement

Answer

You can use a lateral join to pick one row per interval:

select *
from get_intervals() i
cross join lateral
(
  select *
  from my_table t
  where t.timestamp >= i.start_time and t.timestamp < i.end_time
  order by t.timestamp -- or by something random?
  fetch first row only
) d;

(A lateral join is the first thing that comes to mind. To join all rows first and then use DISTINCT ON instead, could lead to a large intermediate result, so I would prefer the lateral join over that approach.)

Advertisement