trusty BQ experts.
Background: I have users reading articles on a website (User Table A) and users clicking through to articles from an email (User Table B) and have BQ view tables for each user set. User Table A is missing some user_id of the users who have clicked through from the emails. See User Table A below.
*User Table A* - Website
id | user_id | article id | viewed_at
------------------------------------------------------------------
1 | 1 | 1000 | 2019-01-25 01:04:00 UTC
2 | 2 | 1001 | 2019-01-25 01:03:00 UTC
3 | 3 | 1002 | 2019-01-25 01:03:00 UTC
4 | null | 1001 | 2019-01-25 01:04:00 UTC
5 | null | 1000 | 2019-01-24 20:49:00 UTC
6 | null | 1003 | 2019-01-24 20:47:00 UTC
*User Table B* - Email
id | user_id | article id | clicked_at
------------------------------------------------------------------
1 | 1 | 1000 | 2019-01-25 01:04:00 UTC
2 | 1 | 1000 | 2019-01-24 20:49:00 UTC
3 | 6 | 1003 | 2019-01-24 20:47:00 UTC
I’m want to create a single view/table which replaces a null user_id result in User Table A with the user_id from User Table B IF the viewed_at and clicked_at are the same AND the article_id from User Table A and User Table B are also the same.
I also want to retain the user_id null result if the corresponding viewed_at/clicked_at and article_id are not found in User Table B.
*Desired Result Table*
id | user_id | article id | viewed_at
------------------------------------------------------------------
1 | 1 | 1000 | 2019-01-25 01:04:00 UTC
2 | 2 | 1001 | 2019-01-25 01:03:00 UTC
3 | 3 | 1002 | 2019-01-25 01:03:00 UTC
4 | null | 1001 | 2019-01-25 01:04:00 UTC
5 | 1 | 1000 | 2019-01-24 20:49:00 UTC
6 | 6 | 1003 | 2019-01-24 20:47:00 UTC
I hope this makes sense.
Please help. This has been doing my head in for months.
Advertisement
Answer
Below is for BigQuery Standard SQL
#standardSQL
SELECT
a.id,
IFNULL(a.user_id, b.user_id) user_id,
a.article_id,
viewed_at
FROM `project.dataset.website` a
LEFT JOIN `project.dataset.email` b
ON a.user_id IS NULL
AND a.article_id = b.article_id
AND viewed_at = clicked_at