I have some code:
CREATE OR REPLACE TEMPORARY VIEW cctv_join_1 AS SELECT * FROM cctv1_details AS A LFET JOIN cctv3_details AS B ON (A.CDE_WR = B.CDE_WR) AND A.CDE_dist = B.CDE_dist
I then try
CREATE OR REPLACE TEMPORARY VIEW cctv_join_2 AS (SELECT * FROM cctv_join_1 AS A LEFT JOIN cctv5_details AS B ON (A.CDE_WR = B.CDE_WR) AND (A.CDE_dist = B.CDE_dist))
but I get an error
Error in SQL statement: AnalysisException: Reference ‘A.CDE_WR’ is ambiguous, could be: A.CDE_WR, A.CDE_WR.; line 6 pos 4
in databricks. How can I deal with this?
Advertisement
Answer
This query:
SELECT * FROM cctv1_details c1d LEFT JOIN cctv3_details c3d ON c1d.CDE_WR = c3d.CDE_WR AND c1d.CDE_dist = c3d.CDE_dist
is using SELECT *
. The *
is shorthand for all columns from both tables.
Obviously the combined columns from the two tables have duplicate column names; at least, CDE_WR
and CDE_dist
— and there may be others. The general solution is to list all the columns out:
SELECT c1d.col1, c1d.col2, . . . c3d.colx, c3d.coly FROM cctv1_details c1d LEFT JOIN cctv3_details c3d ON c1d.CDE_WR = c3d.CDE_WR AND c1d.CDE_dist = c3d.CDE_dist;
However, this is often shorted to:
SELECT c1d.*, c3d.colx, c3d.coly FROM cctv1_details c1d LEFT JOIN cctv3_details c3d ON c1d.CDE_WR = c3d.CDE_WR AND c1d.CDE_dist = c3d.CDE_dist;
Note that I changed the table aliases to be reasonable abbreviations for the table names, making the query much clearer and easier to maintain.