Skip to content
Advertisement

Populating scd type 2 kind of dates for active inactive input data

input and output

I saw this question in one of the interviewing practice platforms and initially assumed its a gaps and islands problem, but I couldn’t go ahead with that logic. In ETL tool called Abinitio, there is a function called key_change using which we can solve these kind of problems. But, I am unable to do it in sql. Could you please help me with this

Advertisement

Answer

Wow, I haven’t heard “Ab Initio” in a long time. In any case, this does appear to be a gaps-and-islands problem, one that can be solved using the difference of row numbers. However, it can also be solved in a similar method which uses two lags and no aggregate:

select customer_id, status,
       effective_date,
       lead(effective_date, 1, '2099-12-31') over (partition by customer_id order by effective_date) as end_date
from (select t.*,
             lag(status) over (partition by customer_id order by effective_date) as prev_status
      from t
     ) t
where prev_status is null or prev_status <> status;

Basically, this picks up every row where there is a change of status and then uses the end date from the next row (where there is a change).

Your confusion over the nature of the problem may be due to how the data is presented. The dates are not ordered which can be confusing.

User contributions licensed under: CC BY-SA
10 People found this is helpful
Advertisement