I have two table as below:
TABLE_1:
timestamp id 2020-11-24 01:05:00 UTC AA 2020-11-24 01:07:00 UTC AA 2020-11-24 01:07:00 UTC BB
TABLE_2:
timestamp id covered 2020-11-24 01:04:00 UTC AA true 2020-11-24 01:06:00 UTC AA false
I want to add a column to table_1 to show covered value from table_2. Although timestamps are not matching, I want to get the closest previous timestamp captured in table_2. For example AA was covered at 1:04 and uncovered at 1:06 so result table look like this:
timestamp id covered 2020-11-24 01:05:00 UTC AA true 2020-11-24 01:07:00 UTC AA false 2020-11-24 01:07:00 UTC BB null
Would you please help me to finish this query:
SELECT TABLE_1.timestamp, TABLE_1.id, TABLE_2.covered FROM TABLE_1 LEFT JOIN TABLE_2 ON TABLE_1.id = TABLE_2.id AND ?
Advertisement
Answer
BigQuery is not thrilled with non-equijoins.  So an alternative is to use union all and some trickery:
select t12.*
from (select t12.*,
             last_value(covered ignore nulls) over (partition by id order by timestamp) as imputed_covered
      from ((select timestamp, id, null as covered
             from table_1
            ) union all
            (select timestamp, id, covered
             from table_2
            )
           ) t12
     ) t12
where covered is null; 
This is combining the two tables.  The last_value() is then retrieving the most recent covered value — which for table_1 rows will come from table_2.
The final where just filters the result set to only have table_1 rows.