I run two SQL queries: The first one have an outer reference to the table inside subquery. In the second one I add the same table inside subquery. The results are different, it fails due to multiple rows.
The first one runs on Oracle, but fails on Spark-SQL. Therefore I am looking for a solution similar to Oracle SQl as in the first SQL code.
Query 1:
select *,
(select N_CODE
from table2 f
where f.ID1 = (select min(f.ID1)
from table1 a left join table2 f on a.ID2 = f.ID2
where a.ID2 = table1.ID2
)
) AS CODE
from table1
Query 2:
select *,
(select N_CODE
from table1 t, table2 f
where f.ID1 = (select min(f.ID1)
from table1 a left join table2 f on a.ID2 = f.ID2
where a.ID2 = t.ID2
)
) AS CODE
from table1
The second one is my solution to the first one in Spark SQL, but it fails on both Oracle and Spark. How can I run the first query on Spark SQL similar to Oracle?
Please do not modify the structure of the query.
Advertisement
Answer
Oracle supports multiple inner queries but spark does not. The best way to overcome it is to split your super query into pieces and use join them.
For instance run this part and save it as a table3:
select min(table2 .ID1)
from table1 a left join table2 f on a.ID2 = f.ID2
where a.ID2 = t.ID2
from table2
Then use it for your main query:
.
where f.ID1 = table3