Given the following tables
buckets
metric_id|start_date |bucket ------------------------------------ a |2019-12-05 00:00:00|1 a |2019-12-06 00:00:00|2 b |2021-10-31 00:00:00|1 b |2021-11-01 00:00:00|2
points
point_id|metric_id|timestamp ---------------------------- 1 |a |2019-12-05 00:00:00 2 |a |2019-12-06 00:00:00 3 |b |2021-10-31 00:00:00 4 |b |2021-11-01 00:00:00
And the following query
select p.metric_id, bucket from points p left join width_bucket(p.timestamp, (select array(select start_date from buckets b where b.metric_id = p.metric_id -- correlated sub-query ))) as bucket on true
Output
metric_id|bucket ----------------- a |1 a |2 b |1 b |2
How can I remove the correlated sub-query to improve the performance?
Currently ~280,000 points * ~650 buckets = ~180,000,000 loops = very slow!
Basically I want to remove the correlated sub-query and apply the width_bucket function only once per unique metric_id in buckets, so that the performance is improved and the function is still given the correct time series data.
How can this be done in Postgres 13?
Advertisement
Answer
You can use a cte to aggregate buckets first
with buckets_arr as ( select metric_id, array_agg(start_date order by start_date) arrb from buckets group by metric_id ) select p.metric_id, width_bucket(p.timestamp, ba.arrb) bucket from points p join buckets_arr ba on p.metric_id = ba.metric_id