Skip to content
Advertisement

“Hive” max column value from multiple columns

Hi: I have a situation where I need to find the max value on 3 calculated fields and store it in another field, is it possible to do it in one SQL query? Below is the example

SELECT Income1 ,
       Income1 * 2%  as Personal_Income ,
       Income2 ,
       Income2 * 10% as Share_Income ,
       Income3 ,
       Income3 * 1%  as Job_Income , 
       Max(Personal_Income, Share_Income, Job_Income ) 
  From Table

One way I tried is to calculate Personal_Income, Share_Income, Job_Income in the first pass and in the second pass I used

Select 
      Case when Personal_income > Share_Income and Personal_Income > Job_Income 
                then Personal_income 
           when Share_income > Job_Income 
                then Share_income 
           Else Job_income as the greatest_income

but this require me to do 2 scans on a billion rows table, How can I avoid this and do it in a single pass? Any help much appreciated.

Advertisement

Answer

As of Hive 1.1.0 you can use greatest() function. This query will do in a single table scan:

select Income1 ,
       Personal_Income ,
       Income2 ,
       Share_Income ,
       Income3 ,
       Job_Income ,
       greatest(Personal_Income, Share_Income, Job_Income ) as greatest_income
from
(
SELECT Income1 ,
       Income1 * 2%  as Personal_Income ,
       Income2 ,
       Income2 * 10% as Share_Income ,
       Income3 ,
       Income3 * 1%  as Job_Income , 
  From Table
)s
;
User contributions licensed under: CC BY-SA
1 People found this is helpful
Advertisement