Skip to content
Advertisement

Attach the most recent timestamp when value was changed to each row

I have a dataset of updated properties:

user_id flag updated_at
123 1 101
123 0 101
123 1 102
123 1 103
123 1 104
124 0 101
124 0 102
124 0 110

I need to assign to each row a value when was the most recent change of flag value, so it should look like this:

user_id flag updated_at recent_updated_at
123 1 101 101
123 0 102 102
123 1 103 103
123 1 104 103
123 1 105 103
124 0 101 101
124 0 102 101
124 0 110 101

I have this query so far, but feel stuck at the moment:

with raw_dataset as (
            SELECT 123 as user_id, 1 as flag, 101 as updated_at
  UNION ALL SELECT 123 as user_id, 0 as flag, 102 as updated_at
  UNION ALL SELECT 123 as user_id, 1 as flag, 103 as updated_at
  UNION ALL SELECT 123 as user_id, 0 as flag, 104 as updated_at
  UNION ALL SELECT 123 as user_id, 0 as flag, 105 as updated_at
  UNION ALL SELECT 124 as user_id, 1 as flag, 101 as updated_at
  UNION ALL SELECT 124 as user_id, 1 as flag, 103 as updated_at
  UNION ALL SELECT 124 as user_id, 1 as flag, 110 as updated_at

)
, dataset as (
select * from (
  select 
    *, 
--    lag(flag, 1, -1) over (partition by user_id order by updated_at ASC) as lag_,
    row_number() over (partition by user_id, flag order by updated_at ASC) as rn
  from raw_dataset
  )
)
SELECT *, first_value(updated_at) over (partition by user_id,flag order by rn ASC) FROM dataset 
ORDER BY user_id, updated_at ASC

Any ideas, tips? Thanks in advance!

Advertisement

Answer

You appear to want the most recent time when the flag value changed. Your question is rather confusing because the data in the question is different from the data in the code.

However, the follow returns the data in the question and what you want:

with raw_dataset as (
            SELECT 123 as user_id, 1 as flag, 101 as updated_at
  UNION ALL SELECT 123 as user_id, 0 as flag, 102 as updated_at
  UNION ALL SELECT 123 as user_id, 1 as flag, 103 as updated_at
  UNION ALL SELECT 123 as user_id, 1 as flag, 104 as updated_at
  UNION ALL SELECT 123 as user_id, 1 as flag, 105 as updated_at
  UNION ALL SELECT 124 as user_id, 0 as flag, 101 as updated_at
  UNION ALL SELECT 124 as user_id, 0 as flag, 103 as updated_at
  UNION ALL SELECT 124 as user_id, 0 as flag, 110 as updated_at
)
select rd.*,
       max(case when prev_flag is null or prev_flag <> flag then updated_at end) over (
            partition by user_id
            order by updated_at
           ) as most_recent_updated_at
from (select rd.*,
             lag(flag) over (partition by user_id order by updated_at) as prev_flag
      from raw_dataset rd
     ) rd;

The logic is pretty simple. Just look at the previous flag. Then do a cumulative max() when the value changes.

Here is a db<>fiddle.

User contributions licensed under: CC BY-SA
10 People found this is helpful
Advertisement