Skip to content
Advertisement

BigQuery SQL: do SELECT aliases have priority over FROM clause column names when used in GROUP BY clause?

I have a simple query that has the ambiguity regarding col:

SELECT SUBSTR(col, 1, 4) AS col, 
       COUNT(*) AS nc
  FROM (SELECT 'test1' AS col
         UNION ALL
        SELECT 'test2' AS col) 
 GROUP BY col

According to the documentation, the query validator should raise an error:

Ambiguity between a FROM clause column name and a SELECT list alias in GROUP BY:

SELECT UPPER(LastName) AS LastName
FROM Singers
GROUP BY LastName;

The query above is ambiguous and will produce an error because LastName in the GROUP BY clause could refer to the original column LastName in Singers, or it could refer to the alias AS LastName, whose value is UPPER(LastName).

But the query executes successfully and provides the following result:

╔═══╦═══════╦══════╗
║   ║ col   ║ nc   ║
╠═══╬═══════╬══════╣
║ 1 ║ test  ║ 2    ║
╚═══╩═══════╩══════╝

It’s clear from the results that GROUP BY clause recognizes col as an alias from the outer select (whole SUBSTR expression).

So here is my question, is it reliable enough behavior? Can I write my queries taking this into account and adapting my queries correspondingly?

Advertisement

Answer

That query is not really ambiguous. I understand the intent well, and BigQuery too.

Compare:

WITH Singers AS (SELECT "Sinatra" LastName UNION ALL SELECT "sinatra")

SELECT UPPER(LastName) AS LastName
FROM Singers
GROUP BY Singers.LastName;

with

WITH Singers AS (SELECT "Sinatra" LastName UNION ALL SELECT "sinatra")

SELECT UPPER(LastName) AS LastName
FROM Singers
GROUP BY LastName;

Both behave as I expected. If anything, the documentation is wrong here, and I’ve filled a ticket so it gets fixed.

User contributions licensed under: CC BY-SA
10 People found this is helpful
Advertisement