Skip to content
Advertisement

Big Query – Calculate start and end date back to back

I have a problem, which I need some advise, I am required to calculate the number of leave calendar days taken back-to-back on big query. (For eg. 2 leave records taken on 07-01-2020 to 10-01-2020 and 13-01-2020 to 15-01-2020, should return 07-01-2020 to 15-01-2020)

However, there are certain weeks, where leave is taken at 3/4 days gap because there is public holiday on that week. Can anyone suggest a possible work around to this? I created a table for public holidays but I am stuck with how I can possible considers weeks with public holiday as back-to-back. I considered window function but I am not sure what is the correct logic.

Original data set

personnel_number start_date end_date next_start_date next_end_date days_between_next_row remarks
100100 16/1/2020 17/1/2020 20/1/2020 24/1/2020 3
100100 20/1/2020 24/1/2020 28/1/2020 31/1/2020 4 “public holiday on 27-Jan”
100100 28/1/2020 31/1/2020 10/2/2020 13/2/2020 10
100100 10/2/2020 13/2/2020 NULL NULL

Public Holiday Table

pub_start_date pub_end_date remarks
25/1/2020 27/1/2020 “CNY Holiday”

Desired outcome

personnel_number start_date back_to_back_end_date
100100 16/1/2020 31/1/2020
100100 10/2/2020 13/2/2020

Advertisement

Answer

Below is for BigQuery Standard SQL

#standardSQL
with temp as (
  -- all pto days from original table
  select personnel_number, day, '1' type from `project.dataset.table`, 
  unnest(generate_date_array(start_date, end_date)) day
  
  union distinct -- add weekend days if last pto day is friday
  select personnel_number, day, '0' type from `project.dataset.table`, 
  unnest([] || if(extract(dayofweek from end_date) = 6, [end_date + 1, end_date + 2], [])) day
  
  union distinct -- all holiday days from holidays table 
  select personnel_number, day, '0' from (select distinct personnel_number from `project.dataset.table`), 
  (select day from holidays, unnest(generate_date_array(pub_start_date, pub_end_date)) day)
  
  union distinct -- add weekend days to holidays if last day of hliday is friday 
  select personnel_number, day, '0' from (select distinct personnel_number from `project.dataset.table`), 
  (select day from holidays, unnest([] || if(extract(dayofweek from pub_end_date) = 6, [pub_end_date + 1, pub_end_date + 2], [])) day) 
)
select personnel_number,
  start_date + start_tail as start_date,                     -- removing leading non pto days
  back_to_back_end_date - end_tail as back_to_back_end_date  -- removing trailing non pto days
from (
  select personnel_number, 
    min(day) start_date, 
    max(day) back_to_back_end_date, 
    length(regexp_extract(string_agg(type, '' order by day), r'^0*')) start_tail, -- detect number of leading non pto days (holidays or weekend days)
    length(regexp_extract(string_agg(type, '' order by day), r'0*$')) end_tail,   -- detect number of leading non pto days (holidays or weekend days)
    regexp_contains(string_agg(type, '' order by day), r'1') valid
  from (
    select personnel_number, day, type, countif(flag) over(partition by personnel_number order by day) grp
    from (
      select *, day != 1 + ifnull(lag(day) over(partition by personnel_number order by day), day) flag 
      from temp
    )
  )
  group by personnel_number, grp
)
where valid

if to apply to sample data from your question

with `project.dataset.table` as (
  select 100100 personnel_number, date '2020-01-16' start_date, date '2020-01-17' end_date union all
  select 100100, '2020-01-20', '2020-01-24' union all
  select 100100, '2020-01-28', '2020-01-31' union all
  select 100101, '2020-02-10', '2020-02-13'
), holidays as (
  select date '2020-01-25' pub_start_date, date '2020-01-27' pub_end_date, 'CNY Holiday' remarks 
)    

output is

enter image description here

User contributions licensed under: CC BY-SA
6 People found this is helpful
Advertisement