i have problem with partitioning the process data i have from our workflow system. data are stored in database step after step (one step is one row) for couple of processes (id) and a date of the event. in simplified data is looks as follows:
id step date 1 a 2019-12-01 1 a 2019-12-02 1 b 2019-12-02 1 x 2019-12-03 1 a 2019-12-04 1 b 2019-12-05 2 a 2019-12-05 2 b 2019-12-06
what i need to have is such short table
id iteration first step-a first step-b 1 1 2019-12-01 2019-12-02 1 2 2019-12-04 2019-12-05 2 1 2019-12-05 2019-12-06
the “x” step is return step. it triggers whole process to start all over from the beginning, so transferring the data into small table#2 i need to split process with id=1 in 2 iterations. there could be multiply a and b steps but you need to choose the first one.
i tried with group by
and partition by
but with no success – i don’t know how to split data given x in the middle. do you have any idea?
Advertisement
Answer
I would approach this with a window sum that defines the iteration. Everytime a x
is met for a given id
, a new iteration starts.
select id, iteration, min(case when step = 'a' then mdate end) first_step_a, min(case when step = 'b' then mdate end) first_step_b from ( select t.*, 1 + sum(case when step = 'x' then 1 else 0 end) over(partition by id order by mdate) iteration from mytable t ) t group by id, iteration order by id, iteration
ID | ITERATION | FIRST_STEP_A | FIRST_STEP_B -: | --------: | :----------- | :----------- 1 | 1 | 2019-12-01 | 2019-12-02 1 | 2 | 2019-12-04 | 2019-12-05 2 | 1 | 2019-12-05 | 2019-12-06