Skip to content
Advertisement

How to grouby data in one column and distribute it in another column in HiveSQL?

I have the following data:

CompanyID Department No of People Country
45390 HR 100 UK
45390 Service 250 UK
98712 Service 300 US
39284 Admin 142 Norway
85932 Admin 260 Germany

I wish to know how many people belong to the same department from different countries?

Required Output

Department No of People Country
HR 100 UK
Service 250 UK
300 US
Admin 142 Norway
260 Germany

I was able to get the data but the Department was repeated by this query.

""" select Department, Country,count(Department) from dataset
    group by Country,Department
    order by Department """

How can I get the desired output?

Advertisement

Answer

The result set that you are producing is not really a relational result set. Why? Because rows depend on what is in the “previous” row. And in a relational database, there is no such thing as a “previous” row. This type of processing is often handled in the application layer.

Of course, SQL can do what you want. You just need to be careful:

select (case when 1 = row_number() over (partition by Department order by Country) 
             then Department
        end) as Department,
       Country, count(*) as num_people,         
from dataset
group by Country,Department
order by Department, Country;

Note that the order by needs to match the window function clause to be sure that what row_number() considered to be the first row is really the first row in the result set.

User contributions licensed under: CC BY-SA
3 People found this is helpful
Advertisement