Skip to content
Advertisement

Hive Most Popular in each group

I have three table

I have to find most popular Author among each of the following age groups: less than 10 years 10 to 18 years 19 to 35 years 36 to 45 years 46 years and above where most popular author is one who got highest number of ratings >= 6

I have tried below code.

but it is only giving me highest rating from all.

Can anyone tell me how to approach this problem as I have used three sub queries i know it is very ugly

I want highest from each age group.

Advertisement

Answer

The main issue with your query is the missing partition by in the row_number() and the limit in the subquery. In addition, you should be counting the books, not summing the ratings:

I also introduced table aliases so the query is easier to write and read.

I don’t remember if Hive allows column aliases in the GROUP BY clause. If it does, then one level of subquery can easily be removed.

User contributions licensed under: CC BY-SA
9 People found this is helpful
Advertisement