Skip to content
Advertisement

Count distinct with HUE/IMPALA

I have a table in HUE like :

ID,ProductID 
1,1
1,2
1,1
1,3 
1,1
1,2
1,1
1,3
2,1   
2,2
2,2
2,2
2,1   
2,2
2,2
2,2

I need to count the distinct number of ProductID’s for each ID.

Something like this :

ID,ProductID, CountofProductID 
1,1,3
1,2,3
1,1,3
1,3,3
2,1,2   
2,2,2
2,2,2
2,2,2

I’ve tried :

SELECT ID,ProductID, count(ProductID) over (partition by ID Sort by ProductID)
GROUP BY ID, ProductID

What I really need to do is a count(distinct) inside the analytical function. HUE doesn’t let me do this.

Is there another way I can count distinct for window of rows?

Advertisement

Answer

Your sample data is confusing. However, you can work around the limitation on analytical distinct counts with a subquery and a join:

select t.*, x.cnt
from mytable t
inner join (
    select id, count(distinct product_id) cnt
    from mytable t
    group by id 
) x on x.id = t.id
User contributions licensed under: CC BY-SA
2 People found this is helpful
Advertisement