I am trying to do count comparision between two table . As minus operator does not work in hive , it is not happening. Could you please give some simple way to do count comparision between two tables.
select 'Call Detail - Hive T1 to HDFS Staging - Data Compare', case when cnt>0 then 'Fail' Else 'Pass' end from (select count(*) cnt from ( (select count(*) from students1 s1)- (select count(*) from students2 s2) ) as tbl1 ) as tbl2;
It’s throwing error:
FAILED: ParseException line 81:0 cannot recognize input near ‘(‘ ‘(‘ ‘select’ in from source
Advertisement
Answer
Use cross join
if you have no group by columns. In this case it will produce one row with both counts:
select s.cnt-s1.cnt diff, case when abs(s.cnt-s1.cnt) > 0 then 'Fail' Else 'Pass' end result from (select count(*) cnt from students1 s1) s cross join (select count(*) cnt from students2 s2) s1
If you will add some group by columns to compare on more detailed grain, then use FULL JOIN
on group by columns:
select s.col1 s_col1, s1.col1 s1_col1, s.cnt-s1.cnt diff, case when abs(s.cnt-s1.cnt) > 0 then 'Fail' Else 'Pass' end result from (select count(*) cnt, col1 from students1 s1 group by col1) s full join (select count(*) cnt, col1 from students2 s2 group by col1) s1 on s.col1 = s1.col1
This query will return joined rows with difference calculated and also not joined rows from both tables.