how to select all the values in hive with distinct…

I have a hive table that looks like this (total 460 columns)

 colA   colB ....... ce_id filename ......... dt
  v      j            4     gg                40
  v      j            5     gg                55
  f      r            4     gg                40
  t      y            7     yy                32
  a      e            5     ee                43

​x
 
 colA   colB ....... ce_id filename ......... dt  v      j            4     gg                40  v      j            5     gg                55  f      r            4     gg                40  t      y            7     yy                32  a      e            5     ee                43​

now i need to write a query that selects all the data using distinct of 2 colunms with ce_id and filename

my expected output

 colA   colB ....... ce_id filename ......... dt
  v      j            4     gg                40
  v      j            5     gg                55
  t      y            7     yy                32
  a      e            5     ee                43

 
 colA   colB ....... ce_id filename ......... dt  v      j            4     gg                40  v      j            5     gg                55  t      y            7     yy                32  a      e            5     ee                43​

anyone can guide me how to select all the data with 2 distinct value

Answer

I think row_number() does what you want:

select t.*
from (select t.*,
             row_number() over (partition by ce_id, filename order by dt) as seqnum
      from t
     ) t
where seqnum = 1;

 
select t.*from (select t.*,             row_number() over (partition by ce_id, filename order by dt) as seqnum      from t     ) twhere seqnum = 1;​

You don’t specify which row you want. The above formulation returns the one with the smallest value of dt. The order by controls the “which”.

how to select all the values in hive with distinct of 2 columns in hive

Advertisement

Answer