Skip to content
Advertisement

BIGQUERY: Replace a ‘null’ result from one table with a user id from another table

trusty BQ experts.

Background: I have users reading articles on a website (User Table A) and users clicking through to articles from an email (User Table B) and have BQ view tables for each user set. User Table A is missing some user_id of the users who have clicked through from the emails. See User Table A below.

*User Table A* - Website

id  | user_id    | article id  | viewed_at
------------------------------------------------------------------
1   | 1          | 1000        | 2019-01-25 01:04:00 UTC
2   | 2          | 1001        | 2019-01-25 01:03:00 UTC
3   | 3          | 1002        | 2019-01-25 01:03:00 UTC
4   | null       | 1001        | 2019-01-25 01:04:00 UTC
5   | null       | 1000        | 2019-01-24 20:49:00 UTC
6   | null       | 1003        | 2019-01-24 20:47:00 UTC


*User Table B* - Email

id  | user_id    | article id  | clicked_at
------------------------------------------------------------------
1   | 1          | 1000        | 2019-01-25 01:04:00 UTC
2   | 1          | 1000        | 2019-01-24 20:49:00 UTC
3   | 6          | 1003        | 2019-01-24 20:47:00 UTC

  • I’m want to create a single view/table which replaces a null user_id result in User Table A with the user_id from User Table B IF the viewed_at and clicked_at are the same AND the article_id from User Table A and User Table B are also the same.

  • I also want to retain the user_id null result if the corresponding viewed_at/clicked_at and article_id are not found in User Table B.

*Desired Result Table*

id  | user_id    | article id  | viewed_at
------------------------------------------------------------------
1   | 1          | 1000        | 2019-01-25 01:04:00 UTC
2   | 2          | 1001        | 2019-01-25 01:03:00 UTC
3   | 3          | 1002        | 2019-01-25 01:03:00 UTC
4   | null       | 1001        | 2019-01-25 01:04:00 UTC
5   | 1          | 1000        | 2019-01-24 20:49:00 UTC
6   | 6          | 1003        | 2019-01-24 20:47:00 UTC

I hope this makes sense.

Please help. This has been doing my head in for months.

Advertisement

Answer

Below is for BigQuery Standard SQL

#standardSQL
SELECT 
  a.id,
  IFNULL(a.user_id, b.user_id) user_id,
  a.article_id,
  viewed_at
FROM `project.dataset.website` a
LEFT JOIN `project.dataset.email` b
ON a.user_id IS NULL
AND a.article_id = b.article_id
AND viewed_at = clicked_at
User contributions licensed under: CC BY-SA
10 People found this is helpful
Advertisement