Skip to content
Advertisement

SQL count number of records in one set of data compared to another and present as a percentage

I would like to count the number of unique records not matching on 2 key variables from Data B compared to Data A and present the result as a percentage of the total unique records in Data B.

Data A:              Data B:
    key1 key2          key1 key2
    1    a             1    a
    2    a             2    a
    2    b             2    b
    3    c             2    d
                       2    b
                       2    d

The code I have is as follows:

SELECT 100 * (SELECT count(DISTINCT b.key2)) / (SELECT COUNT(DISTINCT a.key2)) as output
FROM dataA a LEFT JOIN dataB b
ON a.key1 = b.key1 AND a.key2 = b.key2

From the above data I expect to get the result of 1 / 4 where the numerator represents the (2, d) value from data B that doesn’t exist in data A and the denominator is the unique records by key in Data B. The code currently doesn’t work now I’ve put DISTINCT in, so I’m a bit lost.

Advertisement

Answer

Assuming that DataA does not contain duplicate rows, you must do a LEFT join of DataB to DataA and aggregate:

select avg(case when a.key1 is null then 1.0 else 0 end) output
from (select distinct * from DataB) b left join DataA a
on a.key1 = b.key1 and a.key2 = b.key2

If DataA may contain duplicate rows, then:

select avg(case when a.key1 is null then 1.0 else 0 end) output 
from (select distinct * from DataB) b 
left join (select distinct * from DataA) a
on a.key1 = b.key1 and a.key2 = b.key2

See the demo (for MySql but it is standard SQL).
Result:

> | output |
> | -----: |
> | 0.2500 |
User contributions licensed under: CC BY-SA
8 People found this is helpful
Advertisement