Skip to content
Advertisement

Why select invalid field in subquery could run in BigQuery?

For the following sql

CREATE or replace TABLE
  temp.t1 ( a STRING)
;
insert into temp.t1 values ('val_a');

CREATE or replace TABLE
  temp.t2 (b STRING)
;
insert into temp.t2 values ('val_b');

create or replace table `temp.a1` as
select distinct b
from temp.t2
;

select  distinct a
from `temp.t1`
where a in (select distinct a from `temp.a1`)
;

Since there is no a in temp.a1 and there should be an error here, However, the output of Bigquery is

Row   a 
1     val_a

Why the result happened?

On the other side, when run select distinct a from temp.a1; there is one error Unrecognized name: a comes up.

Advertisement

Answer

Your query is:

select distinct a
from `temp.t1`
where a in (select distinct a from `temp.a1`);

You think this should be:

select distinct t1.a
from `temp.t1` t1
where t1.a in (select distinct a1.a from `temp.a1` a1);

And hence generate an error. However, the rules of SQL interpret this as:

select distinct t1.a
from `temp.t1` t1
where t1.a in (select distinct t1.a from `temp.a1` a1);

Because the scoping rules say that if a is not found in the subquery then look for it in the outer query.

That is the definition of SQL.

The solution? Always qualify column references. Qualify means to include the table alias in the reference.

Also note that select distinct is meaningless in the subquery for an in, because in does not create duplicates. You should get rid of the distinct in the subquery.

User contributions licensed under: CC BY-SA
7 People found this is helpful
Advertisement