I have a problem, which I need some advise, I am required to calculate the number of leave calendar days taken back-to-back on big query. (For eg. 2 leave records taken on 07-01-2020
to 10-01-2020
and 13-01-2020
to 15-01-2020
, should return 07-01-2020
to 15-01-2020
)
However, there are certain weeks, where leave is taken at 3/4 days gap because there is public holiday on that week. Can anyone suggest a possible work around to this? I created a table for public holidays but I am stuck with how I can possible considers weeks with public holiday as back-to-back. I considered window function but I am not sure what is the correct logic.
Original data set
personnel_number | start_date | end_date | next_start_date | next_end_date | days_between_next_row | remarks |
---|---|---|---|---|---|---|
100100 | 16/1/2020 | 17/1/2020 | 20/1/2020 | 24/1/2020 | 3 | |
100100 | 20/1/2020 | 24/1/2020 | 28/1/2020 | 31/1/2020 | 4 | “public holiday on 27-Jan” |
100100 | 28/1/2020 | 31/1/2020 | 10/2/2020 | 13/2/2020 | 10 | |
100100 | 10/2/2020 | 13/2/2020 | NULL | NULL |
Public Holiday Table
pub_start_date | pub_end_date | remarks |
---|---|---|
25/1/2020 | 27/1/2020 | “CNY Holiday” |
Desired outcome
personnel_number | start_date | back_to_back_end_date |
---|---|---|
100100 | 16/1/2020 | 31/1/2020 |
100100 | 10/2/2020 | 13/2/2020 |
Advertisement
Answer
Below is for BigQuery Standard SQL
#standardSQL with temp as ( -- all pto days from original table select personnel_number, day, '1' type from `project.dataset.table`, unnest(generate_date_array(start_date, end_date)) day union distinct -- add weekend days if last pto day is friday select personnel_number, day, '0' type from `project.dataset.table`, unnest([] || if(extract(dayofweek from end_date) = 6, [end_date + 1, end_date + 2], [])) day union distinct -- all holiday days from holidays table select personnel_number, day, '0' from (select distinct personnel_number from `project.dataset.table`), (select day from holidays, unnest(generate_date_array(pub_start_date, pub_end_date)) day) union distinct -- add weekend days to holidays if last day of hliday is friday select personnel_number, day, '0' from (select distinct personnel_number from `project.dataset.table`), (select day from holidays, unnest([] || if(extract(dayofweek from pub_end_date) = 6, [pub_end_date + 1, pub_end_date + 2], [])) day) ) select personnel_number, start_date + start_tail as start_date, -- removing leading non pto days back_to_back_end_date - end_tail as back_to_back_end_date -- removing trailing non pto days from ( select personnel_number, min(day) start_date, max(day) back_to_back_end_date, length(regexp_extract(string_agg(type, '' order by day), r'^0*')) start_tail, -- detect number of leading non pto days (holidays or weekend days) length(regexp_extract(string_agg(type, '' order by day), r'0*$')) end_tail, -- detect number of leading non pto days (holidays or weekend days) regexp_contains(string_agg(type, '' order by day), r'1') valid from ( select personnel_number, day, type, countif(flag) over(partition by personnel_number order by day) grp from ( select *, day != 1 + ifnull(lag(day) over(partition by personnel_number order by day), day) flag from temp ) ) group by personnel_number, grp ) where valid
if to apply to sample data from your question
with `project.dataset.table` as ( select 100100 personnel_number, date '2020-01-16' start_date, date '2020-01-17' end_date union all select 100100, '2020-01-20', '2020-01-24' union all select 100100, '2020-01-28', '2020-01-31' union all select 100101, '2020-02-10', '2020-02-13' ), holidays as ( select date '2020-01-25' pub_start_date, date '2020-01-27' pub_end_date, 'CNY Holiday' remarks )
output is