Skip to content
Advertisement

Hive Query : To calculate max indicator value based on priority and date

I tried to frame the query but somehow not getting the required result hence posting. I am new to hive. Apologies if it is very simple.

Source Data :

Problem statement

Based on priority and date , we will need to populate the indicator values (ind1 and ind2) for each ik’s.

Output table format

Ik, ind1,ind2

Logic is

Here Group by would be done on ik field. So for above data set , in output will have only single record gets populated.

If for same ik value , priority is A and indicator flag (ind1 , ind2 ) is y value then output should populate as “y”.

But if same ik , priority is A but indicator is not having value “y”. (possible values are null,n,empty string)

Then will select latest indicator based on date field (order by date – latest record group by ik ) from B C priority.

Output of above dataset is

Here ind1 is max (ind1) . I am able to derive. But unable to derive ind2.

Could you help me to create the query ?

Advertisement

Answer

Testing on your data: http://demo.gethue.com/hue/editor?editor=293916

Result:

User contributions licensed under: CC BY-SA
8 People found this is helpful
Advertisement