My table of interview candidates has three columns and looks like this (attempt is what I want to calculate):
candidate_id | interview_stage | stage_reached_at | attempt <- want to calculate |
---|---|---|---|
1 | 1 | 2019-01-01 | 1 |
1 | 2 | 2019-01-02 | 1 |
1 | 3 | 2019-01-03 | 1 |
1 | 1 | 2019-11-01 | 2 |
1 | 2 | 2019-11-02 | 2 |
1 | 1 | 2021-01-01 | 3 |
1 | 2 | 2021-01-02 | 3 |
1 | 3 | 2021-01-03 | 3 |
1 | 4 | 2021-01-04 | 3 |
The table represents candidate_id 1 who has had 3 separate interview attempts at a company.
- Made it to interview_stage 3 on the 1st attempt
- Made it to interview_stage 2 on the 2nd attempt
- Made it to interview_stage 4 on the 3d attempt
Question: Can I somehow use the number series if I order by stage_reached_at? As soon as the next step for a particular candidate_id is lower than the row before, I know it’s a new process.
I want to be able to group on candidate_id and process_grouping at the end of the day.
Thx in advance.
Advertisement
Answer
You can use lag()
and then a cumulative sum:
select t.*, sum(case when prev_interview_stage >= interview_stage then 1 else 0 end) over (partition by candidate_id order by stage_reached_at) as attempt from (select t.*, lag(interview_stage) over (partition by candidate_id order by stage_reached_at) as prev_interview_stage from t ) t;
Note: Your question specifically says “lower”. I wonder, though, if you really mean “lower or equal to”. If the latter, change the >=
to >
.