how to select all the values in hive with distinct of 2 columns in hive

Question

I have a hive table that looks like this (total 460 columns) colA colB &#8230;&#8230;. ce_id filename &#8230;&#8230;&#8230; dt v j 4 gg 40 v j 5 gg &#8230;

Accepted Answer

I think row_number() does what you want:select t.*from (select t.*,             row_number() over (partition by ce_id, filename order by dt) as seqnum      from t     ) twhere seqnum = 1;You don&#8217;t specify which row you want.  The above formulation returns the one with the smallest value of dt.  The order by controls the &#8220;which&#8221;.

Advertisement

Answer