Skip to content
Advertisement

Hive: less than operator error in subquery

I want the latest records from HIVE table using the following query-

WITH lot as (select *
from  to_burn_in as a where a.rel_lot='${Rel_Lot}')
select a.* from lot AS a
where not exists (select 1 from lot as b 
where a.Rel_Lot=b.Rel_Lot and a.SerialNum=b.SerialNum and a.Test_Stage=b.Test_Stage 
and cast(a.test_datetime as TIMESTAMP) < cast(b.Test_Datetime as TIMESTAMP))
order by a.SerialNum

this query is throwing a error as

Error while compiling statement: FAILED: SemanticException line 0:undefined:-1 Unsupported SubQuery Expression 'Test_Datetime': SubQuery expression refers to both Parent and SubQuery expressions and is not a valid join condition.

I have tried running with equal operator in place of the less than operator in subquery and it is running fine. I read the HIVE documentation as given in https://cwiki.apache.org/confluence/display/Hive/LanguageManual+SubQueries and couldn’t figure out why it is throwing a error as ‘where’ subquery is supported. What might be the problem here?

Advertisement

Answer

EXISTS works the same as a join actually. Not equality join conditions are not supported in Hive prior Hive 2.2.0 (see HIVE-15211, HIVE-15251)

It seems you are trying to get records having latest timestamp per Rel_Lot,SerialNum,Test_Stage. Your query can be rewritten using dense_rank() or rank() function:

WITH lot as (select *
from  to_burn_in as a where a.rel_lot='${Rel_Lot}'
)

select * from 
(
select a.*,
       dense_rank() over(partition by Rel_Lot,SerialNum,Test_Stage order by cast(a.test_datetime as TIMESTAMP) desc) as rnk
  from lot AS a
)s 
where rnk=1
order by s.SerialNum
User contributions licensed under: CC BY-SA
10 People found this is helpful
Advertisement