I would like to count the number of unique records not matching on 2 key variables from Data B compared to Data A and present the result as a percentage of the total unique records in Data B.
x
Data A: Data B:
key1 key2 key1 key2
1 a 1 a
2 a 2 a
2 b 2 b
3 c 2 d
2 b
2 d
The code I have is as follows:
SELECT 100 * (SELECT count(DISTINCT b.key2)) / (SELECT COUNT(DISTINCT a.key2)) as output
FROM dataA a LEFT JOIN dataB b
ON a.key1 = b.key1 AND a.key2 = b.key2
From the above data I expect to get the result of 1 / 4 where the numerator represents the (2, d) value from data B that doesn’t exist in data A and the denominator is the unique records by key in Data B. The code currently doesn’t work now I’ve put DISTINCT in, so I’m a bit lost.
Advertisement
Answer
Assuming that DataA
does not contain duplicate rows, you must do a LEFT
join of DataB
to DataA
and aggregate:
select avg(case when a.key1 is null then 1.0 else 0 end) output
from (select distinct * from DataB) b left join DataA a
on a.key1 = b.key1 and a.key2 = b.key2
If DataA
may contain duplicate rows, then:
select avg(case when a.key1 is null then 1.0 else 0 end) output
from (select distinct * from DataB) b
left join (select distinct * from DataA) a
on a.key1 = b.key1 and a.key2 = b.key2
See the demo (for MySql but it is standard SQL).
Result:
> | output |
> | -----: |
> | 0.2500 |