Skip to content
Advertisement

Redshift sql identify records based on timestamp which came before a certain record

I’ve a redshift table which has the following structure

titleId | country | updateTime | value 
ID1 | US | 2020-01-01 00:00:00.000 | someValueA
ID1 | US | 2020-01-01 00:00:01.000 | someValueB
ID1 | IN | 2020-01-04 00:00:05.000 | someValue
ID2 ....
ID3....
ID1 | US | 2021-02-02 00:00:00.000 | someValue5
ID1 | GB | 2021-02-02 00:00:00.000 | someValue5

I’m trying to find 3 sets, all titleIds [preferably entire row, not just the list of titleIds] which have country IN after US, the other way around which have US after IN and all titles which just has an IN entry and nothing else.

Now its possible that we have for one titleId the following order IN, US, IN, US and in this case, we have 2 instances of US after IN.

I intially thought of doing an inner join on the same table which can help me find records which have both US and IN territory. But then I wasn’t able to figure out how to use those results to filter based on which came before what. Is it possible via Redshift SQL? Does it require me to write some custom code after doing some sort of filtering ?

Advertisement

Answer

You can select each row that meets the conditions using:

select t.*
from (select t.*,
             sum( (country = 'US')::int) over (partition by titleid order by updatetime rows between current row and unbounded following) as num_us_following,
             sum( (country = 'IN')::int) over (partition by titleid order by updatetime rows between current row and unbounded following) as num_in_following,
             sum( country <> 'IN')::int) over (partition by titleid) as non_nonind
      from t
     ) t;

Then your three conditions are:

  • where country = 'IN' and num_us_following > 0
  • where country = 'US' and num_in_following > 0
  • where country = 'IN' and non_nonin = 0
User contributions licensed under: CC BY-SA
10 People found this is helpful
Advertisement