I have a simple query that has the ambiguity regarding col
:
SELECT SUBSTR(col, 1, 4) AS col, COUNT(*) AS nc FROM (SELECT 'test1' AS col UNION ALL SELECT 'test2' AS col) GROUP BY col
According to the documentation, the query validator should raise an error:
Ambiguity between a FROM clause column name and a SELECT list alias in GROUP BY:
SELECT UPPER(LastName) AS LastName FROM Singers GROUP BY LastName;The query above is ambiguous and will produce an error because LastName in the GROUP BY clause could refer to the original column LastName in Singers, or it could refer to the alias AS LastName, whose value is UPPER(LastName).
But the query executes successfully and provides the following result:
╔═══╦═══════╦══════╗ ║ ║ col ║ nc ║ ╠═══╬═══════╬══════╣ ║ 1 ║ test ║ 2 ║ ╚═══╩═══════╩══════╝
It’s clear from the results that GROUP BY clause recognizes col
as an alias from the outer select (whole SUBSTR
expression).
So here is my question, is it reliable enough behavior? Can I write my queries taking this into account and adapting my queries correspondingly?
Advertisement
Answer
That query is not really ambiguous. I understand the intent well, and BigQuery too.
Compare:
WITH Singers AS (SELECT "Sinatra" LastName UNION ALL SELECT "sinatra") SELECT UPPER(LastName) AS LastName FROM Singers GROUP BY Singers.LastName;
with
WITH Singers AS (SELECT "Sinatra" LastName UNION ALL SELECT "sinatra") SELECT UPPER(LastName) AS LastName FROM Singers GROUP BY LastName;
Both behave as I expected. If anything, the documentation is wrong here, and I’ve filled a ticket so it gets fixed.