Skip to content
Advertisement

Big Query data totals are wrong, suspect duplicate rows after LEFT JOIN

I’m trying to join two tables in Google Big Query and I think it’s duplicating a few rows (I’m guessing this because the totals are wrong when I then use the data in Data Studio).

To give some context, I’ve done something that probably isn’t the ideal setup but which seemed easiest for my level of knowledge. I first combined data from a bunch of different sources in Google Sheets before uploading to Big query, all of it is segmented by date but some sources are also segmented by device (they’ll be getting filtered into different reports so this isn’t a problem for me, although it might be relevant to the issue). All sources have the same columns though, just that some columns will be empty for certain sources (i.e. some not having device).

I then separately upload a set of data that I need to join with the data from a Google sheet based on the campaign name. While this works perfectly for the source that is segmented by device, the totals are all wrong (always too high) for the two sources that aren’t segmented by device.

I’m pretty sure the mistake has something to do with the last two lines, but nothing I try seems to fix it.

Edit: sample tables for clarity:

Master

Additional

In this scenario I’d ideally want to add Metric 3 to rows 2, 5, 6, and 7 but leave the other rows alone.

Advertisement

Answer

Below is for BigQuery Standard SQL

if to apply to sample data in your question – result is

I’d still like the other rows to be kept in the data (rows 3 and 4), i jsut don’t want them to get any of the additional data added to them

Use below

this produces

getting an error because ‘Campaign in USING clause not found on right side of table’ – pretty sure this is because the right table has ‘campaign_c’ instead of ‘campaign’

If join columns names are different – use ON instead of USING as in example below

User contributions licensed under: CC BY-SA
9 People found this is helpful
Advertisement