I have a hive table that looks like this (total 460 columns)
x
colA colB . ce_id filename dt
v j 4 gg 40
v j 5 gg 55
f r 4 gg 40
t y 7 yy 32
a e 5 ee 43
now i need to write a query that selects all the data using distinct of 2 colunms with ce_id and filename
my expected output
colA colB . ce_id filename dt
v j 4 gg 40
v j 5 gg 55
t y 7 yy 32
a e 5 ee 43
anyone can guide me how to select all the data with 2 distinct value
Advertisement
Answer
I think row_number()
does what you want:
select t.*
from (select t.*,
row_number() over (partition by ce_id, filename order by dt) as seqnum
from t
) t
where seqnum = 1;
You don’t specify which row you want. The above formulation returns the one with the smallest value of dt
. The order by
controls the “which”.