Skip to content
Advertisement

Hive Query : To calculate max indicator value based on priority and date

I tried to frame the query but somehow not getting the required result hence posting. I am new to hive. Apologies if it is very simple.

Source Data :

Ik - priority - ind1 - ind2 - date
1 -   A -           y -       n   -    2009/01/01
1 -   B -           n -       y  -     2019/02/09
1 -   C -          null -     (empty)- 2018/05/07
2 -   A -          null -     y -      2005/02/02
2 -   B -          null -     y -      2006/05/05
2 -   C -           n -       null -   2018/01/01

Problem statement

Based on priority and date , we will need to populate the indicator values (ind1 and ind2) for each ik’s.

Output table format

Ik, ind1,ind2

Logic is

Here Group by would be done on ik field. So for above data set , in output will have only single record gets populated.

If for same ik value , priority is A and indicator flag (ind1 , ind2 ) is y value then output should populate as “y”.

But if same ik , priority is A but indicator is not having value “y”. (possible values are null,n,empty string)

Then will select latest indicator based on date field (order by date – latest record group by ik ) from B C priority.

Output of above dataset is

Ik - ind1 - ind2
1 -    y     -   y
2 -    n    -   y

Here ind1 is max (ind1) . I am able to derive. But unable to derive ind2.

Could you help me to create the query ?

Advertisement

Answer

Testing on your data: http://demo.gethue.com/hue/editor?editor=293916

with your_table as -------use your table instead of this subquery
(
select stack(6,

1 ,'A',  'y','n',    '2009/01/01',
1 ,'B',  'n','y',    '2019/02/09',
1 ,'C', null,'' ,    '2018/05/07',
2 ,'A', null,'y',    '2005/02/02',
2 ,'B', null,'y',    '2006/05/05',
2 ,'C',  'n', null,  '2018/01/01'

) as (Ik, priority, ind1, ind2, date)
) -------use your table instead of this subquery

select ik, 
       max(case when priority ='A' and ind1='y' then 'y' else last_ind1 end) ind1,
       max(case when priority ='A' and ind2='y' then 'y' else last_ind2 end) ind2
from
(
select Ik, priority, ind1, ind2, date,
       last_value(ind1) over (partition by Ik order by date) last_ind1,
       last_value(ind2) over (partition by Ik order by date) last_ind2
  from your_table -------use your table instead 
)s
group by ik;

Result:

ik  ind1    ind2
1   y   y
2   n   y
User contributions licensed under: CC BY-SA
8 People found this is helpful
Advertisement