I have a table which stores predictions from a machine learning model. This is a model that each hour (“predicted_at”) predicts a value for the next 24 hours(“predicted_for”). This means that the table have many different values for each “id” and “predicted_for”.
Example of how the the table looks like for one ID and one predicted_for timestamp:
value | id | predicted_at | predicted_for | |
---|---|---|---|---|
0.1825819489860161 | 6970631400382957 | 2021-08-21 16:00:00 UTC | 2021-08-21 23:00:00 UTC | |
0.14882256844401498 | 6970631400382957 | 2021-08-21 17:00:00 UTC | 2021-08-20 23:00:00 UTC | |
0.17357749613149909 | 6970631400382957 | 2021-08-21 17:00:00 UTC | 2021-08-21 23:00:00 UTC | |
0.20283864055714163 | 6970631400382957 | 2021-08-21 18:00:00 UTC | 2021-08-21 23:00:00 UTC | |
0.18582932551434195 | 6970631400382957 | 2021-08-21 19:00:00 UTC | 2021-08-21 23:00:00 UTC | |
0.18859835855398877 | 6970631400382957 | 2021-08-21 20:00:00 UTC | 2021-08-21 23:00:00 UTC | |
0.15969341546283378 | 6970631400382957 | 2021-08-21 21:00:00 UTC | 2021-08-21 23:00:00 UTC | |
0.21578765348925422 | 6970631400382957 | 2021-08-21 22:00:00 UTC | 2021-08-21 23:00:00 UTC |
What I want to do is to query this data so that I get, for each ID, only one predicted value for each “predicted_for” timestemp, and I want this value to be “predicted_at” at “predicted_for – t(2) hours”.
For the example, the result would give me the following table:
value | id | predicted_at | predicted_for | |
---|---|---|---|---|
0.15969341546283378 | 6970631400382957 | 2021-08-21 21:00:00 UTC | 2021-08-21 23:00:00 UTC |
I assume I would have some kind of group by and a having clause, but I have not figured out how to solve it.
Anyone have an idea?
Advertisement
Answer
Consider below approach
select any_value(value) value, id, any_value(predicted_at) predicted_at, predicted_for, from data t where predicted_at = predicted_for - interval 2 hour group by id, predicted_for
if applied to sample data in your question – output is