I know the namespace isn’t the correct term for this but it conveys what I’m trying to understand. Take this query:
WITH Movies AS ( SELECT 'Titanic' AS title, 1997 AS year, ['Drama',' Romance'] AS Genres ) SELECT title, year, Movies.Genres FROM Movies, Movies.Genres where Genres='Drama'
But then using Movies.Genres
instead of Genres
gives a different result:
WITH Movies AS ( SELECT 'Titanic' AS title, 1997 AS year, ['Drama',' Romance'] AS Genres ) SELECT title, year,Genres FROM Movies, Movies.Genres where Genres='Drama'
Why does that occur? Additionally, is this standard (ISO) SQL, or is this just how BigQuery performs array-unnesting and joins?
Advertisement
Answer
When you need to use array in cross (or other types of) join – you have below options
from tableA cross join unnest(array) as element
from tableA, unnest(array) as element
— here comma is actually shortcut for cross join- if array is a column of tableA, you can use yet another shortcut (now for unnest)
from tableA, tableA.array as element
the benefit of using UNNEST is that you can define OFFSET – like in below example
from tableA, unnest(array) as element with offset
Having offset is extremely important in many use cases
In my practice – i am using all of above options depends on specific case
Now, as of difference in two queries:
So, the first query
SELECT title, year, Movies.Genres FROM Movies, Movies.Genres where Genres='Drama'
is equivalent to below (notice aliase)
SELECT title, year, Movies.Genres FROM Movies, Movies.Genres as Genres where Genres='Drama'
Genres
in where Genres='Drama'
refers to element from implicitly unnested Movies.Genres – so one of element is Drama
which returns 1 element out of total two
And, now trick is that in select statement you explicitly calling out Movies.Genres
which is the original array in that row – so that explains the output
You can try below to kind of confirm above explanation
SELECT title, year, Movies.Genres FROM Movies, Movies.Genres as Genres where Genres in ('Drama', ' Romance')
for above output will be
While above explained the output for first query – I hope it is now clear why second query returns actual genre (Drama) instead of array
Hope this helped in understanding differences :o)