Skip to content
Advertisement

how to select all the values in hive with distinct of 2 columns in hive

I have a hive table that looks like this (total 460 columns)

 colA   colB ....... ce_id filename ......... dt
  v      j            4     gg                40
  v      j            5     gg                55
  f      r            4     gg                40
  t      y            7     yy                32
  a      e            5     ee                43

now i need to write a query that selects all the data using distinct of 2 colunms with ce_id and filename

my expected output

 colA   colB ....... ce_id filename ......... dt
  v      j            4     gg                40
  v      j            5     gg                55
  t      y            7     yy                32
  a      e            5     ee                43

anyone can guide me how to select all the data with 2 distinct value

Advertisement

Answer

I think row_number() does what you want:

select t.*
from (select t.*,
             row_number() over (partition by ce_id, filename order by dt) as seqnum
      from t
     ) t
where seqnum = 1;

You don’t specify which row you want. The above formulation returns the one with the smallest value of dt. The order by controls the “which”.

User contributions licensed under: CC BY-SA
9 People found this is helpful
Advertisement