Query table with multiple “duplicates”, getting the most recent

I have a table which stores predictions from a machine learning model. This is a model that each hour (“predicted_at”) predicts a value for the next 24 hours(“predicted_for”). This means that the table have many different values for each “id” and “predicted_for”.

Example of how the the table looks like for one ID and one predicted_for timestamp:

value	id	predicted_at	predicted_for
0.1825819489860161	6970631400382957	2021-08-21 16:00:00 UTC	2021-08-21 23:00:00 UTC
0.14882256844401498	6970631400382957	2021-08-21 17:00:00 UTC	2021-08-20 23:00:00 UTC
0.17357749613149909	6970631400382957	2021-08-21 17:00:00 UTC	2021-08-21 23:00:00 UTC
0.20283864055714163	6970631400382957	2021-08-21 18:00:00 UTC	2021-08-21 23:00:00 UTC
0.18582932551434195	6970631400382957	2021-08-21 19:00:00 UTC	2021-08-21 23:00:00 UTC
0.18859835855398877	6970631400382957	2021-08-21 20:00:00 UTC	2021-08-21 23:00:00 UTC
0.15969341546283378	6970631400382957	2021-08-21 21:00:00 UTC	2021-08-21 23:00:00 UTC
0.21578765348925422	6970631400382957	2021-08-21 22:00:00 UTC	2021-08-21 23:00:00 UTC

What I want to do is to query this data so that I get, for each ID, only one predicted value for each “predicted_for” timestemp, and I want this value to be “predicted_at” at “predicted_for – t(2) hours”.

For the example, the result would give me the following table:

value	id	predicted_at	predicted_for
0.15969341546283378	6970631400382957	2021-08-21 21:00:00 UTC	2021-08-21 23:00:00 UTC

I assume I would have some kind of group by and a having clause, but I have not figured out how to solve it.

Anyone have an idea?

Answer

Consider below approach

select any_value(value) value, id, 
  any_value(predicted_at) predicted_at, predicted_for,   
from data t
where predicted_at = predicted_for - interval 2 hour
group by id, predicted_for

​x
 
select any_value(value) value, id,   any_value(predicted_at) predicted_at, predicted_for,   from data twhere predicted_at = predicted_for - interval 2 hourgroup by id, predicted_for          ​

if applied to sample data in your question – output is

Advertisement

Answer