I have a hive table that looks like this (total 460 columns)
colA colB ....... ce_id filename ......... dt v j 4 gg 40 v j 5 gg 55 f r 4 gg 40 t y 7 yy 32 a e 5 ee 43
now i need to write a query that selects all the data using distinct of 2 colunms with ce_id and filename
my expected output
colA colB ....... ce_id filename ......... dt v j 4 gg 40 v j 5 gg 55 t y 7 yy 32 a e 5 ee 43
anyone can guide me how to select all the data with 2 distinct value
Advertisement
Answer
I think row_number()
does what you want:
select t.* from (select t.*, row_number() over (partition by ce_id, filename order by dt) as seqnum from t ) t where seqnum = 1;
You don’t specify which row you want. The above formulation returns the one with the smallest value of dt
. The order by
controls the “which”.