ID value timestamp1 timestamp2 1 1. 1. 1 1. 2. 2 1 1. 0. 3 1 1. 2. 4. 1 2. 2. 3 4 2. 2. 4. 4
I need to figure out how to query this table such that I find out the continuous count for each ID in which the value is equal or above 1 and the timestamp1 is bigger than timestamp 2. So for example:
ID 1 will have a count of 2 because the 3rd row is 0 and even though the 4th is of value 2 I need for it to be continuous.
ID 2 will have a count of 1 because row 4 timestamp 1 is below timstamp 2.
Please help, have tried a bunch of things
Advertisement
Answer
If I understand correctly, this is a gaps-and-islands problem. You can get each range as:
select id, count(*) from (select t.*, row_number() over (partition by id order by timestamp1) as seqnum, row_number() over (partition by id, (value >= 1), (timestamp1 >= timestamp2) order by timestamp1 ) as seqnum_2 from t ) t where value >= 1 and (timestamp1 >= timestamp2) group by id, (seqnum - seqnum_2);
In Postgres, you can get the maximum using distinct on
. I’m guessing that you are using Redshift and under the mistaken impression that it is equivalent to Postgres (it might have been decades ago). You can get what you want using an additional level of aggregation:
select id, max(cnt) from (select id, count(*) as cnt from (select t.*, row_number() over (partition by id order by timestamp1) as seqnum, row_number() over (partition by id, (value >= 1), (timestamp1 >= timestamp2) order by timestamp1 ) as seqnum_2 from t ) t where value >= 1 and (timestamp1 >= timestamp2) group by id, (seqnum - seqnum_2) ) ig group by id;
Here is a db<>fiddle.