I would like to count the number of unique records not matching on 2 key variables from Data B compared to Data A and present the result as a percentage of the total unique records in Data B.
Data A: Data B: key1 key2 key1 key2 1 a 1 a 2 a 2 a 2 b 2 b 3 c 2 d 2 b 2 d
The code I have is as follows:
SELECT 100 * (SELECT count(DISTINCT b.key2)) / (SELECT COUNT(DISTINCT a.key2)) as output FROM dataA a LEFT JOIN dataB b ON a.key1 = b.key1 AND a.key2 = b.key2
From the above data I expect to get the result of 1 / 4 where the numerator represents the (2, d) value from data B that doesn’t exist in data A and the denominator is the unique records by key in Data B. The code currently doesn’t work now I’ve put DISTINCT in, so I’m a bit lost.
Advertisement
Answer
Assuming that DataA
does not contain duplicate rows, you must do a LEFT
join of DataB
to DataA
and aggregate:
select avg(case when a.key1 is null then 1.0 else 0 end) output from (select distinct * from DataB) b left join DataA a on a.key1 = b.key1 and a.key2 = b.key2
If DataA
may contain duplicate rows, then:
select avg(case when a.key1 is null then 1.0 else 0 end) output from (select distinct * from DataB) b left join (select distinct * from DataA) a on a.key1 = b.key1 and a.key2 = b.key2
See the demo (for MySql but it is standard SQL).
Result:
> | output | > | -----: | > | 0.2500 |