Skip to content
Advertisement

Left join matching between dates

I am trying to pair the date of a national holiday with time bucket boundaries. I would like to left join against buckets. So I can identify if a holiday falls within a certain time frame has_holiday.

With the following query, I receive a Query error:

Query error: LEFT OUTER JOIN cannot be used without a condition that is an equality of fields from both sides of the join. at [6:1]

What is the best way to approach this problem?

DECLARE now_date TIMESTAMP DEFAULT CURRENT_TIMESTAMP();
DECLARE end_date TIMESTAMP DEFAULT TIMESTAMP_SUB(now_date, INTERVAL 365 DAY);
DECLARE block ARRAY<TIMESTAMP> DEFAULT GENERATE_TIMESTAMP_ARRAY(end_date, now_date, INTERVAL 7 DAY);
---------------------------------------------------------------------

WITH buckets AS(
SELECT
  TIMESTAMP_SUB(bucket_end, INTERVAL 7 DAY) AS bucket_start,
  bucket_end
FROM
  UNNEST(block) AS bucket_end
)

SELECT
  bucket_start,
  bucket_end,
  IF(holiday_name IS NOT NULL, 1, 0) AS has_holiday
FROM
  buckets AS b 
LEFT JOIN
  `external_landing.tbl_public_holidays_2020` AS ex
ON
  ex.date BETWEEN bucket_start AND bucket_end

Table structure for external_landing.tbl_public_holidays_2020

|    | day       | date                | holiday_name                    | type                 |
|---:|:----------|:--------------------|:--------------------------------|:---------------------|
|  0 | Tuesday   | 2019-01-01 00:00:00 | New Year's Day                  | National Holiday     |
|  1 | Wednesday | 2019-01-02 00:00:00 | Day after New Year's Day        | National Holiday     |
|  2 | Monday    | 2019-01-21 00:00:00 | Wellington Anniversary Day      | Regional Holiday     |
|  3 | Monday    | 2019-01-28 00:00:00 | Auckland Anniversary Day        | Regional Holiday     |
|  4 | Monday    | 2019-02-04 00:00:00 | Nelson Anniversary Day          | Regional Holiday     |
|  5 | Wednesday | 2019-02-06 00:00:00 | Waitangi Day                    | National Holiday     |
|  6 | Monday    | 2019-03-11 00:00:00 | Taranaki Anniversary Day        | Regional Holiday     |
|  7 | Monday    | 2019-03-25 00:00:00 | Otago Anniversary Day           | Regional Holiday     |
|  8 | Friday    | 2019-04-19 00:00:00 | Good Friday                     | National Holiday     |
|  9 | Monday    | 2019-04-22 00:00:00 | Easter Monday                   | National Holiday     |
| 10 | Tuesday   | 2019-04-23 00:00:00 | Southland Anniversary Day       | Regional Holiday     |
| 11 | Thursday  | 2019-04-25 00:00:00 | Anzac Day                       | National Holiday     |
| 12 | Sunday    | 2019-05-12 00:00:00 | Mother's Day                    | Not A Public Holiday |
| 13 | Monday    | 2019-06-03 00:00:00 | Queen's Birthday                | National Holiday     |
| 14 | Sunday    | 2019-09-01 00:00:00 | Father's Day                    | Not A Public Holiday |
| 15 | Monday    | 2019-09-23 00:00:00 | South Canterbury Day            | Regional Holiday     |
| 16 | Friday    | 2019-10-25 00:00:00 | Hawke's Bay Anniversary Day     | Regional Holiday     |
| 17 | Monday    | 2019-10-28 00:00:00 | Labour Day                      | National Holiday     |
| 18 | Monday    | 2019-11-04 00:00:00 | Marlborough Anniversary Day     | Regional Holiday     |
| 19 | Friday    | 2019-11-15 00:00:00 | Christchurch Show Day           | Regional Holiday     |
| 20 | Monday    | 2019-12-02 00:00:00 | Chatham Islands Anniversary Day | Regional Holiday     |
| 21 | Monday    | 2019-12-02 00:00:00 | Westland Anniversary Day        | Regional Holiday     |
| 22 | Wednesday | 2019-12-25 00:00:00 | Christmas Day                   | National Holiday     |
| 23 | Thursday  | 2019-12-26 00:00:00 | Boxing Day                      | National Holiday     |

Advertisement

Answer

Try below (BigQuery Standard SQL)

DECLARE now_date TIMESTAMP DEFAULT CURRENT_TIMESTAMP();
DECLARE end_date TIMESTAMP DEFAULT TIMESTAMP_SUB(now_date, INTERVAL 365 DAY);
DECLARE block ARRAY<TIMESTAMP> DEFAULT GENERATE_TIMESTAMP_ARRAY(end_date, now_date, INTERVAL 7 DAY);
---------------------------------------------------------------------

WITH buckets AS(
  SELECT TIMESTAMP_SUB(bucket_end, INTERVAL 7 DAY) AS bucket_start, bucket_end
  FROM UNNEST(block) AS bucket_end
)
SELECT
  bucket_start,
  bucket_end,
  IF(holiday_name IS NOT NULL, 1, 0) AS has_holiday
FROM buckets AS b 
CROSS JOIN `external_landing.tbl_public_holidays_2020` AS ex
WHERE ex.date BETWEEN bucket_start AND bucket_end     

Meantime, I would suggest below version

DECLARE now_date TIMESTAMP DEFAULT CURRENT_TIMESTAMP();
DECLARE end_date TIMESTAMP DEFAULT TIMESTAMP_SUB(now_date, INTERVAL 365 DAY);
DECLARE block ARRAY<TIMESTAMP> DEFAULT GENERATE_TIMESTAMP_ARRAY(end_date, now_date, INTERVAL 7 DAY);
---------------------------------------------------------------------

WITH buckets AS(
  SELECT TIMESTAMP_SUB(bucket_end, INTERVAL 7 DAY) AS bucket_start, bucket_end
  FROM UNNEST(block) AS bucket_end
)
SELECT
  bucket_start,
  bucket_end,
  COUNTIF(holiday_name IS NOT NULL AND ex.date BETWEEN bucket_start AND bucket_end) AS holidays
FROM buckets AS b 
CROSS JOIN `external_landing.tbl_public_holidays_2020` AS ex
GROUP BY bucket_start, bucket_end   

This one will output all buckets with number of the holidays in them
If you want to exclude bucket with no holidays – just add below to the end

HAVING holidays > 0   
User contributions licensed under: CC BY-SA
5 People found this is helpful
Advertisement