Skip to content
Advertisement

Query table with multiple “duplicates”, getting the most recent

I have a table which stores predictions from a machine learning model. This is a model that each hour (“predicted_at”) predicts a value for the next 24 hours(“predicted_for”). This means that the table have many different values for each “id” and “predicted_for”.

Example of how the the table looks like for one ID and one predicted_for timestamp:

value id predicted_at predicted_for
0.1825819489860161 6970631400382957 2021-08-21 16:00:00 UTC 2021-08-21 23:00:00 UTC
0.14882256844401498 6970631400382957 2021-08-21 17:00:00 UTC 2021-08-20 23:00:00 UTC
0.17357749613149909 6970631400382957 2021-08-21 17:00:00 UTC 2021-08-21 23:00:00 UTC
0.20283864055714163 6970631400382957 2021-08-21 18:00:00 UTC 2021-08-21 23:00:00 UTC
0.18582932551434195 6970631400382957 2021-08-21 19:00:00 UTC 2021-08-21 23:00:00 UTC
0.18859835855398877 6970631400382957 2021-08-21 20:00:00 UTC 2021-08-21 23:00:00 UTC
0.15969341546283378 6970631400382957 2021-08-21 21:00:00 UTC 2021-08-21 23:00:00 UTC
0.21578765348925422 6970631400382957 2021-08-21 22:00:00 UTC 2021-08-21 23:00:00 UTC

What I want to do is to query this data so that I get, for each ID, only one predicted value for each “predicted_for” timestemp, and I want this value to be “predicted_at” at “predicted_for – t(2) hours”.

For the example, the result would give me the following table:

value id predicted_at predicted_for
0.15969341546283378 6970631400382957 2021-08-21 21:00:00 UTC 2021-08-21 23:00:00 UTC

I assume I would have some kind of group by and a having clause, but I have not figured out how to solve it.

Anyone have an idea?

Advertisement

Answer

Consider below approach

select any_value(value) value, id, 
  any_value(predicted_at) predicted_at, predicted_for,   
from data t
where predicted_at = predicted_for - interval 2 hour
group by id, predicted_for          

if applied to sample data in your question – output is

enter image description here

User contributions licensed under: CC BY-SA
3 People found this is helpful
Advertisement