Skip to content
Advertisement

Understanding the tables referred to in a BigQuery array cross join

I know the namespace isn’t the correct term for this but it conveys what I’m trying to understand. Take this query:

enter image description here

But then using Movies.Genres instead of Genres gives a different result:

enter image description here

Why does that occur? Additionally, is this standard (ISO) SQL, or is this just how BigQuery performs array-unnesting and joins?

Advertisement

Answer

When you need to use array in cross (or other types of) join – you have below options

  1. from tableA cross join unnest(array) as element
  2. from tableA, unnest(array) as element — here comma is actually shortcut for cross join
  3. if array is a column of tableA, you can use yet another shortcut (now for unnest)
    from tableA, tableA.array as element

the benefit of using UNNEST is that you can define OFFSET – like in below example

Having offset is extremely important in many use cases
In my practice – i am using all of above options depends on specific case

Now, as of difference in two queries:

So, the first query

is equivalent to below (notice aliase)

Genres in where Genres='Drama' refers to element from implicitly unnested Movies.Genres – so one of element is Drama which returns 1 element out of total two
And, now trick is that in select statement you explicitly calling out Movies.Genres which is the original array in that row – so that explains the output

You can try below to kind of confirm above explanation

for above output will be

enter image description here

While above explained the output for first query – I hope it is now clear why second query returns actual genre (Drama) instead of array

Hope this helped in understanding differences :o)

User contributions licensed under: CC BY-SA
5 People found this is helpful
Advertisement