Skip to content
Advertisement

To compare count between two hive table

I am trying to do count comparision between two table . As minus operator does not work in hive , it is not happening. Could you please give some simple way to do count comparision between two tables.

select  'Call Detail - Hive T1 to HDFS Staging - Data Compare',
case when cnt>0 then 'Fail' Else 'Pass' end
from
(select count(*) cnt from (
(select 
count(*) from students1 s1)-
(select count(*) from students2 s2)
) as tbl1
) as tbl2;

It’s throwing error:

FAILED: ParseException line 81:0 cannot recognize input near ‘(‘ ‘(‘ ‘select’ in from source

Advertisement

Answer

Use cross join if you have no group by columns. In this case it will produce one row with both counts:

select s.cnt-s1.cnt diff, case when abs(s.cnt-s1.cnt) > 0 then 'Fail' Else 'Pass' end result
from
(select count(*) cnt  from students1 s1) s
cross join
(select count(*) cnt from students2 s2) s1

If you will add some group by columns to compare on more detailed grain, then use FULL JOIN on group by columns:

select s.col1 s_col1, s1.col1 s1_col1, s.cnt-s1.cnt diff, case when abs(s.cnt-s1.cnt) > 0 then 'Fail' Else 'Pass' end result
from
(select count(*) cnt, col1  from students1 s1 group by col1) s
full join
(select count(*) cnt, col1 from students2 s2 group by col1) s1 
on s.col1 = s1.col1

This query will return joined rows with difference calculated and also not joined rows from both tables.

User contributions licensed under: CC BY-SA
7 People found this is helpful
Advertisement