I’m facing this issue where I don’t really know how to handle duplicate rows when joining two tables.
I have two tables I’d like to join
Value_x table :
ID Campaign Value_x foo fb 1 foo twitter 3 bar fb 2 foobar ads 2 foo organic 3
Value_y table:
ID Campaign Value_y foo organic 42 foobar organic 17 foo ads 12 foo twitter 23 bar twitter 35
Because of the Campaign column, ID are not unique in each table and for example, left join table output looks like this:
ID t1.Campaign t2.Campaign Value_x Value_y foo fb organic 1 42 foo fb ads 1 12
Ideally I’d like to have the following output
ID t1.Campaign t2.Campaign Value_x Value_y foo fb null 1 null foo twitter twitter 3 23 foo organic organic 3 42 foo null ads null 12
How would you handle a situation like this? I read something about foreign keys
… I’m not familiar with this concept how can it help in this context? Or is there a way to perform a join at the campaign level?
Advertisement
Answer
Below is for BigQuery Standard SQL
#standardSQL SELECT ID, x.Campaign AS Campaign_x, y.Campaign AS Campaign_y, Value_x, Value_y FROM `project.dataset.tableX` x FULL OUTER JOIN `project.dataset.tableY` y USING(ID, Campaign)
You can test, play with above using sample data from your questions as in below example
#standardSQL WITH `project.dataset.tableX` AS ( SELECT 'foo' ID, 'fb' Campaign, 1 Value_x UNION ALL SELECT 'foo', 'twitter', 3 UNION ALL SELECT 'bar', 'fb', 2 UNION ALL SELECT 'foobar', 'ads', 2 UNION ALL SELECT 'foo', 'organic', 3 ), `project.dataset.tableY` AS ( SELECT 'foo' ID, 'organic' Campaign, 42 Value_y UNION ALL SELECT 'foobar', 'organic', 17 UNION ALL SELECT 'foo', 'ads', 12 UNION ALL SELECT 'foo', 'twitter', 23 UNION ALL SELECT 'bar', 'twitter', 35 ) SELECT ID, x.Campaign AS Campaign_x, y.Campaign AS Campaign_y, Value_x, Value_y FROM `project.dataset.tableX` x FULL OUTER JOIN `project.dataset.tableY` y USING(ID, Campaign) -- ORDER BY ID
with result
Row ID Campaign_x Campaign_y Value_x Value_y 1 bar fb null 2 null 2 bar null twitter null 35 3 foo fb null 1 null 4 foo twitter twitter 3 23 5 foo organic organic 3 42 6 foo null ads null 12 7 foobar ads null 2 null 8 foobar null organic null 17