i have problem with partitioning the process data i have from our workflow system. data are stored in database step after step (one step is one row) for couple of processes (id) and a date of the event. in simplified data is looks as follows:
id step date 1 a 2019-12-01 1 a 2019-12-02 1 b 2019-12-02 1 x 2019-12-03 1 a 2019-12-04 1 b 2019-12-05 2 a 2019-12-05 2 b 2019-12-06
what i need to have is such short table
id iteration first step-a first step-b 1 1 2019-12-01 2019-12-02 1 2 2019-12-04 2019-12-05 2 1 2019-12-05 2019-12-06
the “x” step is return step. it triggers whole process to start all over from the beginning, so transferring the data into small table#2 i need to split process with id=1 in 2 iterations. there could be multiply a and b steps but you need to choose the first one.
i tried with group by and partition by but with no success – i don’t know how to split data given x in the middle. do you have any idea?
Advertisement
Answer
I would approach this with a window sum that defines the iteration. Everytime a x is met for a given id, a new iteration starts.
select
id,
iteration,
min(case when step = 'a' then mdate end) first_step_a,
min(case when step = 'b' then mdate end) first_step_b
from (
select
t.*,
1 + sum(case when step = 'x' then 1 else 0 end)
over(partition by id order by mdate) iteration
from mytable t
) t
group by id, iteration
order by id, iteration
ID | ITERATION | FIRST_STEP_A | FIRST_STEP_B -: | --------: | :----------- | :----------- 1 | 1 | 2019-12-01 | 2019-12-02 1 | 2 | 2019-12-04 | 2019-12-05 2 | 1 | 2019-12-05 | 2019-12-06