Aggregated row count differences between tables

I have two MySQL tables A and B both with this schema

ID	entity_id	asset	asset_type
0	12345	x	1
..	………	…..	……….

I would like to get an aggregated top 10/50/whatever entity_ids with the largest row count difference between the two tables. I think I could do this manually by just getting the highest row count by entity_id like so

select count(*), entity_id
-> from A
-> group by entity_id
-> order by count(*) desc;

​x
 
select count(*), entity_id-> from A-> group by entity_id-> order by count(*) desc;​

and just manually comparing to the same query for table B but I’m wondering if there’s a way to do this in just one query, that compares row counts for each distinct entity_id and aggregates the differences between row counts. A few notes

There is an index on entity_id for both tables
Table B will always have an equivalent or greater number of rows for each entity_id

Sample output

entity_id	difference
12345	100
3232	75
5992	40
and so on	for top 10/50

Answer

Aggregate in each table and join the results to get the difference:

SELECT a.entity_id, b.counter - a.counter diff
FROM (SELECT entity_id, COUNT(*) counter FROM A GROUP BY entity_id) a
INNER JOIN (SELECT entity_id, COUNT(*) counter FROM B GROUP BY entity_id) b
ON a.entity_id = b.entity_id
ORDER BY diff DESC LIMIT 10

 
SELECT a.entity_id, b.counter - a.counter diffFROM (SELECT entity_id, COUNT(*) counter FROM A GROUP BY entity_id) aINNER JOIN (SELECT entity_id, COUNT(*) counter FROM B GROUP BY entity_id) bON a.entity_id = b.entity_idORDER BY diff DESC LIMIT 10​

Advertisement

Answer