I want the latest records from HIVE table using the following query-
WITH lot as (select * from to_burn_in as a where a.rel_lot='${Rel_Lot}') select a.* from lot AS a where not exists (select 1 from lot as b where a.Rel_Lot=b.Rel_Lot and a.SerialNum=b.SerialNum and a.Test_Stage=b.Test_Stage and cast(a.test_datetime as TIMESTAMP) < cast(b.Test_Datetime as TIMESTAMP)) order by a.SerialNum
this query is throwing a error as
Error while compiling statement: FAILED: SemanticException line 0:undefined:-1 Unsupported SubQuery Expression 'Test_Datetime': SubQuery expression refers to both Parent and SubQuery expressions and is not a valid join condition.
I have tried running with equal operator in place of the less than operator in subquery and it is running fine. I read the HIVE documentation as given in https://cwiki.apache.org/confluence/display/Hive/LanguageManual+SubQueries and couldn’t figure out why it is throwing a error as ‘where’ subquery is supported. What might be the problem here?
Advertisement
Answer
EXISTS works the same as a join actually. Not equality join conditions are not supported in Hive prior Hive 2.2.0 (see HIVE-15211, HIVE-15251)
It seems you are trying to get records having latest timestamp per Rel_Lot,SerialNum,Test_Stage
. Your query can be rewritten using dense_rank() or rank() function:
WITH lot as (select * from to_burn_in as a where a.rel_lot='${Rel_Lot}' ) select * from ( select a.*, dense_rank() over(partition by Rel_Lot,SerialNum,Test_Stage order by cast(a.test_datetime as TIMESTAMP) desc) as rnk from lot AS a )s where rnk=1 order by s.SerialNum