I am trying to pair the date of a national holiday with time bucket boundaries.
I would like to left join against buckets
.
So I can identify if a holiday falls within a certain time frame has_holiday
.
With the following query, I receive a Query error:
Query error: LEFT OUTER JOIN cannot be used without a condition that is an equality of fields from both sides of the join. at [6:1]
What is the best way to approach this problem?
DECLARE now_date TIMESTAMP DEFAULT CURRENT_TIMESTAMP(); DECLARE end_date TIMESTAMP DEFAULT TIMESTAMP_SUB(now_date, INTERVAL 365 DAY); DECLARE block ARRAY<TIMESTAMP> DEFAULT GENERATE_TIMESTAMP_ARRAY(end_date, now_date, INTERVAL 7 DAY); --------------------------------------------------------------------- WITH buckets AS( SELECT TIMESTAMP_SUB(bucket_end, INTERVAL 7 DAY) AS bucket_start, bucket_end FROM UNNEST(block) AS bucket_end ) SELECT bucket_start, bucket_end, IF(holiday_name IS NOT NULL, 1, 0) AS has_holiday FROM buckets AS b LEFT JOIN `external_landing.tbl_public_holidays_2020` AS ex ON ex.date BETWEEN bucket_start AND bucket_end
Table structure for external_landing.tbl_public_holidays_2020
| | day | date | holiday_name | type | |---:|:----------|:--------------------|:--------------------------------|:---------------------| | 0 | Tuesday | 2019-01-01 00:00:00 | New Year's Day | National Holiday | | 1 | Wednesday | 2019-01-02 00:00:00 | Day after New Year's Day | National Holiday | | 2 | Monday | 2019-01-21 00:00:00 | Wellington Anniversary Day | Regional Holiday | | 3 | Monday | 2019-01-28 00:00:00 | Auckland Anniversary Day | Regional Holiday | | 4 | Monday | 2019-02-04 00:00:00 | Nelson Anniversary Day | Regional Holiday | | 5 | Wednesday | 2019-02-06 00:00:00 | Waitangi Day | National Holiday | | 6 | Monday | 2019-03-11 00:00:00 | Taranaki Anniversary Day | Regional Holiday | | 7 | Monday | 2019-03-25 00:00:00 | Otago Anniversary Day | Regional Holiday | | 8 | Friday | 2019-04-19 00:00:00 | Good Friday | National Holiday | | 9 | Monday | 2019-04-22 00:00:00 | Easter Monday | National Holiday | | 10 | Tuesday | 2019-04-23 00:00:00 | Southland Anniversary Day | Regional Holiday | | 11 | Thursday | 2019-04-25 00:00:00 | Anzac Day | National Holiday | | 12 | Sunday | 2019-05-12 00:00:00 | Mother's Day | Not A Public Holiday | | 13 | Monday | 2019-06-03 00:00:00 | Queen's Birthday | National Holiday | | 14 | Sunday | 2019-09-01 00:00:00 | Father's Day | Not A Public Holiday | | 15 | Monday | 2019-09-23 00:00:00 | South Canterbury Day | Regional Holiday | | 16 | Friday | 2019-10-25 00:00:00 | Hawke's Bay Anniversary Day | Regional Holiday | | 17 | Monday | 2019-10-28 00:00:00 | Labour Day | National Holiday | | 18 | Monday | 2019-11-04 00:00:00 | Marlborough Anniversary Day | Regional Holiday | | 19 | Friday | 2019-11-15 00:00:00 | Christchurch Show Day | Regional Holiday | | 20 | Monday | 2019-12-02 00:00:00 | Chatham Islands Anniversary Day | Regional Holiday | | 21 | Monday | 2019-12-02 00:00:00 | Westland Anniversary Day | Regional Holiday | | 22 | Wednesday | 2019-12-25 00:00:00 | Christmas Day | National Holiday | | 23 | Thursday | 2019-12-26 00:00:00 | Boxing Day | National Holiday |
Advertisement
Answer
Try below (BigQuery Standard SQL)
DECLARE now_date TIMESTAMP DEFAULT CURRENT_TIMESTAMP(); DECLARE end_date TIMESTAMP DEFAULT TIMESTAMP_SUB(now_date, INTERVAL 365 DAY); DECLARE block ARRAY<TIMESTAMP> DEFAULT GENERATE_TIMESTAMP_ARRAY(end_date, now_date, INTERVAL 7 DAY); --------------------------------------------------------------------- WITH buckets AS( SELECT TIMESTAMP_SUB(bucket_end, INTERVAL 7 DAY) AS bucket_start, bucket_end FROM UNNEST(block) AS bucket_end ) SELECT bucket_start, bucket_end, IF(holiday_name IS NOT NULL, 1, 0) AS has_holiday FROM buckets AS b CROSS JOIN `external_landing.tbl_public_holidays_2020` AS ex WHERE ex.date BETWEEN bucket_start AND bucket_end
Meantime, I would suggest below version
DECLARE now_date TIMESTAMP DEFAULT CURRENT_TIMESTAMP(); DECLARE end_date TIMESTAMP DEFAULT TIMESTAMP_SUB(now_date, INTERVAL 365 DAY); DECLARE block ARRAY<TIMESTAMP> DEFAULT GENERATE_TIMESTAMP_ARRAY(end_date, now_date, INTERVAL 7 DAY); --------------------------------------------------------------------- WITH buckets AS( SELECT TIMESTAMP_SUB(bucket_end, INTERVAL 7 DAY) AS bucket_start, bucket_end FROM UNNEST(block) AS bucket_end ) SELECT bucket_start, bucket_end, COUNTIF(holiday_name IS NOT NULL AND ex.date BETWEEN bucket_start AND bucket_end) AS holidays FROM buckets AS b CROSS JOIN `external_landing.tbl_public_holidays_2020` AS ex GROUP BY bucket_start, bucket_end
This one will output all buckets with number of the holidays in them
If you want to exclude bucket with no holidays – just add below to the end
HAVING holidays > 0