Why does outer reference in SQL subquery produce d…

I run two SQL queries: The first one have an outer reference to the table inside subquery. In the second one I add the same table inside subquery. The results are different, it fails due to multiple rows.

The first one runs on Oracle, but fails on Spark-SQL. Therefore I am looking for a solution similar to Oracle SQl as in the first SQL code.

Query 1:

select *, 
(select N_CODE 
from table2 f 
where f.ID1 = (select min(f.ID1) 
               from table1 a left join table2 f on a.ID2 = f.ID2
               where a.ID2 = table1.ID2 
               ) 
) AS CODE

from table1

Query 2:

select *, 
(select N_CODE 
from table1 t, table2 f 
where f.ID1 = (select min(f.ID1) 
               from table1 a left join table2 f on a.ID2 = f.ID2
               where a.ID2 = t.ID2 
               ) 
) AS CODE

from table1

The second one is my solution to the first one in Spark SQL, but it fails on both Oracle and Spark. How can I run the first query on Spark SQL similar to Oracle?

Please do not modify the structure of the query.

Answer

Oracle supports multiple inner queries but spark does not. The best way to overcome it is to split your super query into pieces and use join them.

For instance run this part and save it as a table3:

select min(table2 .ID1)
           from table1 a left join table2 f on a.ID2 = f.ID2
           where a.ID2 = t.ID2
from table2

Then use it for your main query:

....
where f.ID1 = table3

Why does outer reference in SQL subquery produce different results?

Advertisement

Answer