I have the below input:
id hid startdate enddate c1 c2 c3 1 1 1 2 xxx xxx 28d 1 2 3 4 xxx xxx 45r 1 3 1 2 xxx yyy 6jg 1 4 5 6 xxx xxx gj6 1 5 7 8 xxx xxx to5
I need output like this:
id hid startdate enddate c1 c2 c3 1 2 1 4 xxx xxx 45r 1 3 1 2 xxx yyy 6jg 1 5 5 8 xxx xxx to5
Query developed so far:
select id,max(hid),min(startdate),max(enddate), c1,c2,max(c3) from table group by id,c1,c2
Not getting the startdate & c3 columns as expected. Any idea on this? Thanks in advance.
Advertisement
Answer
This is a gap-and-islands problem. You can use the difference of row numbers:
select id, c1, c2, max(hid), min(startdate), max(enddate), max(c3) from (select t.*, row_number() over (partition by id order by startdate) as seqnum, row_number() over (partition by id, c1, c2 order by startdate) as seqnum_2 from t ) t group by id, c1, c2, max(hid), (seqnum - seqnum_2);
I’m not sure what the logic is for hid
and c3
, but max()
seems to work for your sample data.
EDIT (based on comment):
To get c3
from the last row:
select id, c1, c2, max(case when seqnum_within_group = 1 then hid end) as hid, min(startdate), max(enddate), max(case when seqnum_within_group = 1 then c3 end) as c3 from (select t.*, row_number() over (partition by id, c1, c2, (seqnum - seqnum_2) order by startdate desc) as seqnum_within_group from (select t.*, row_number() over (partition by id order by startdate) as seqnum, row_number() over (partition by id, c1, c2 order by startdate) as seqnum_2 from t ) t ) t group by id, c1, c2, max(hid), (seqnum - seqnum_2);