Skip to content
Advertisement

How do I use lag to get the previous row before a specific time window of data?

Every day I create a table that looks like this:

user_id received_at age_pref ethnicity_pref
1 10:01 18-28 open_to_all
2 10:05 18-23 open_to_all
1 10:08 18-30 open_to_all
2 10:07 18-25 Hispanic/Iatino
3 10:09 56-33 White

It’s a table that lists the actions a user takes from 10am-11am. As you can see, there are 3 distinct user IDs.

Using this, I’m trying to create another table using lag to see if the previous value changed or not. However, the problem is that the first row is inaccurate because there’s no way for me to measure if an attribute changed without the previous row before this set of data (maybe it occurred at 930am). How do I get the previous received_at row for each user ID in this table, but only 1 for each user_id? I want the new table to look like this, where the new records are prepended at the beginning

user_id received_at age_pref ethnicity_pref
1 9:48 20-30 asian
2 9:52 30-32 white
3 9:58 28-30 open_to_all
1 10:01 18-28 open_to_all
2 10:05 18-23 open_to_all
1 10:08 18-30 open_to_all
2 10:07 18-25 Hispanic/Iatino
3 10:09 56-33 White
  • note there are several rows that exist before this time interval. I want the most recent one prepended to the table for the user_ids that exist in the table.

  • basically I want to include 1 more row for EACH user_id before the time window so that my table that tracks changes is accurate since lag will always have the first row be null.

Advertisement

Answer

I guess you can union all the following query:

select distinct on (user_id) user_id, received_at, age_pref, ethnicity_pref
from t
order by user_id, received_at desc
where received_at < '10:00'
User contributions licensed under: CC BY-SA
10 People found this is helpful
Advertisement