I have a hive table that looks like this (total 460 columns)
colA colB ....... ce_id filename ......... dt v j 4 gg 40 v j 5 gg 55 f r 4 gg 40 t y 7 yy 32 a e 5 ee 43
now i need to write a query that selects all the data using distinct of 2 colunms with ce_id and filename
my expected output
colA colB ....... ce_id filename ......... dt v j 4 gg 40 v j 5 gg 55 t y 7 yy 32 a e 5 ee 43
anyone can guide me how to select all the data with 2 distinct value
Advertisement
Answer
I think row_number() does what you want:
select t.*
from (select t.*,
row_number() over (partition by ce_id, filename order by dt) as seqnum
from t
) t
where seqnum = 1;
You don’t specify which row you want. The above formulation returns the one with the smallest value of dt. The order by controls the “which”.