SQL Inner Join returns duplicates

I have the following 2 tables:

tab1 with 37146 rows

week_ref with 730 rows

All I want to do is join those tables on year and week so that the first week day and last week day will display next to the columns of the first table.

Below is my query:

SELECT tab1.year
      ,tab1.week
      ,tab1.col3
      ,tab1.col4
      ,tab1.col5
      ,tab1.col6
      ,tab1.total
      ,tab1.col7
      ,week_ref.first_week_day
      ,week_ref.last_week_day
      
FROM dtsetname.tab1

JOIN spyros.week_ref ON (week_ref.year = tab1.year AND week_ref.week = tab1.week)

​x
 
SELECT tab1.year      ,tab1.week      ,tab1.col3      ,tab1.col4      ,tab1.col5      ,tab1.col6      ,tab1.total      ,tab1.col7      ,week_ref.first_week_day      ,week_ref.last_week_day      FROM dtsetname.tab1​JOIN spyros.week_ref ON (week_ref.year = tab1.year AND week_ref.week = tab1.week)​

The return of the query returns the 2 extra columns but the rows are 255535. So it is full of duplicates. I used to get how join works, but I guess not anymore xd… Any help on this? The correct output table should only give me 37146 rows since I only just want to add 2 extra columns.

Thanks

Answer

Below is for BigQuery Standard SQL

Before JOIN’ing you just need to dedup data in week_ref table as in below example

#standardSQL
SELECT tab1.year
      ,tab1.week
      ,tab1.col3
      ,tab1.col4
      ,tab1.col5
      ,tab1.col6
      ,tab1.total
      ,tab1.col7
      ,week_ref.first_week_day
      ,week_ref.last_week_day
FROM dtsetname.tab1 tab1
JOIN (SELECT DISTINCT year, week, first_week_day, last_week_day FROM spyros.week_ref) week_ref
ON (week_ref.year = tab1.year AND week_ref.week = tab1.week)

 
#standardSQLSELECT tab1.year      ,tab1.week      ,tab1.col3      ,tab1.col4      ,tab1.col5      ,tab1.col6      ,tab1.total      ,tab1.col7      ,week_ref.first_week_day      ,week_ref.last_week_dayFROM dtsetname.tab1 tab1JOIN (SELECT DISTINCT year, week, first_week_day, last_week_day FROM spyros.week_ref) week_refON (week_ref.year = tab1.year AND week_ref.week = tab1.week) ​

Advertisement

Answer