I have a problem, which I need some advise, I am required to calculate the number of leave calendar days taken back-to-back on big query. (For eg. 2 leave records taken on 07-01-2020
to 10-01-2020
and 13-01-2020
to 15-01-2020
, should return 07-01-2020
to 15-01-2020
)
However, there are certain weeks, where leave is taken at 3/4 days gap because there is public holiday on that week. Can anyone suggest a possible work around to this? I created a table for public holidays but I am stuck with how I can possible considers weeks with public holiday as back-to-back. I considered window function but I am not sure what is the correct logic.
Original data set
personnel_number | start_date | end_date | next_start_date | next_end_date | days_between_next_row | remarks |
---|---|---|---|---|---|---|
100100 | 16/1/2020 | 17/1/2020 | 20/1/2020 | 24/1/2020 | 3 | |
100100 | 20/1/2020 | 24/1/2020 | 28/1/2020 | 31/1/2020 | 4 | “public holiday on 27-Jan” |
100100 | 28/1/2020 | 31/1/2020 | 10/2/2020 | 13/2/2020 | 10 | |
100100 | 10/2/2020 | 13/2/2020 | NULL | NULL |
Public Holiday Table
pub_start_date | pub_end_date | remarks |
---|---|---|
25/1/2020 | 27/1/2020 | “CNY Holiday” |
Desired outcome
personnel_number | start_date | back_to_back_end_date |
---|---|---|
100100 | 16/1/2020 | 31/1/2020 |
100100 | 10/2/2020 | 13/2/2020 |
Advertisement
Answer
Below is for BigQuery Standard SQL
#standardSQL
with temp as (
-- all pto days from original table
select personnel_number, day, '1' type from `project.dataset.table`,
unnest(generate_date_array(start_date, end_date)) day
union distinct -- add weekend days if last pto day is friday
select personnel_number, day, '0' type from `project.dataset.table`,
unnest([] || if(extract(dayofweek from end_date) = 6, [end_date + 1, end_date + 2], [])) day
union distinct -- all holiday days from holidays table
select personnel_number, day, '0' from (select distinct personnel_number from `project.dataset.table`),
(select day from holidays, unnest(generate_date_array(pub_start_date, pub_end_date)) day)
union distinct -- add weekend days to holidays if last day of hliday is friday
select personnel_number, day, '0' from (select distinct personnel_number from `project.dataset.table`),
(select day from holidays, unnest([] || if(extract(dayofweek from pub_end_date) = 6, [pub_end_date + 1, pub_end_date + 2], [])) day)
)
select personnel_number,
start_date + start_tail as start_date, -- removing leading non pto days
back_to_back_end_date - end_tail as back_to_back_end_date -- removing trailing non pto days
from (
select personnel_number,
min(day) start_date,
max(day) back_to_back_end_date,
length(regexp_extract(string_agg(type, '' order by day), r'^0*')) start_tail, -- detect number of leading non pto days (holidays or weekend days)
length(regexp_extract(string_agg(type, '' order by day), r'0*$')) end_tail, -- detect number of leading non pto days (holidays or weekend days)
regexp_contains(string_agg(type, '' order by day), r'1') valid
from (
select personnel_number, day, type, countif(flag) over(partition by personnel_number order by day) grp
from (
select *, day != 1 + ifnull(lag(day) over(partition by personnel_number order by day), day) flag
from temp
)
)
group by personnel_number, grp
)
where valid
if to apply to sample data from your question
with `project.dataset.table` as (
select 100100 personnel_number, date '2020-01-16' start_date, date '2020-01-17' end_date union all
select 100100, '2020-01-20', '2020-01-24' union all
select 100100, '2020-01-28', '2020-01-31' union all
select 100101, '2020-02-10', '2020-02-13'
), holidays as (
select date '2020-01-25' pub_start_date, date '2020-01-27' pub_end_date, 'CNY Holiday' remarks
)
output is