Skip to content
Advertisement

bigquery left join by closest previous value

I have two table as below:

TABLE_1:

timestamp                id
2020-11-24 01:05:00 UTC  AA
2020-11-24 01:07:00 UTC  AA
2020-11-24 01:07:00 UTC  BB

TABLE_2:

timestamp                id   covered
2020-11-24 01:04:00 UTC  AA   true
2020-11-24 01:06:00 UTC  AA   false

I want to add a column to table_1 to show covered value from table_2. Although timestamps are not matching, I want to get the closest previous timestamp captured in table_2. For example AA was covered at 1:04 and uncovered at 1:06 so result table look like this:

timestamp                id   covered
2020-11-24 01:05:00 UTC  AA   true
2020-11-24 01:07:00 UTC  AA   false
2020-11-24 01:07:00 UTC  BB   null

Would you please help me to finish this query:

SELECT TABLE_1.timestamp, TABLE_1.id, TABLE_2.covered 
FROM TABLE_1 
LEFT JOIN 
TABLE_2
ON TABLE_1.id = TABLE_2.id
AND ?

Advertisement

Answer

BigQuery is not thrilled with non-equijoins. So an alternative is to use union all and some trickery:

select t12.*
from (select t12.*,
             last_value(covered ignore nulls) over (partition by id order by timestamp) as imputed_covered
      from ((select timestamp, id, null as covered
             from table_1
            ) union all
            (select timestamp, id, covered
             from table_2
            )
           ) t12
     ) t12
where covered is null; 

This is combining the two tables. The last_value() is then retrieving the most recent covered value — which for table_1 rows will come from table_2.

The final where just filters the result set to only have table_1 rows.

User contributions licensed under: CC BY-SA
8 People found this is helpful
Advertisement