I have two MySQL tables A and B both with this schema
ID | entity_id | asset | asset_type |
---|---|---|---|
0 | 12345 | x | 1 |
.. | ……… | ….. | ………. |
I would like to get an aggregated top 10/50/whatever entity_ids
with the largest row count difference between the two tables. I think I could do this manually by just getting the highest row count by entity_id
like so
select count(*), entity_id -> from A -> group by entity_id -> order by count(*) desc;
and just manually comparing to the same query for table B but I’m wondering if there’s a way to do this in just one query, that compares row counts for each distinct entity_id
and aggregates the differences between row counts. A few notes
- There is an index on
entity_id
for both tables - Table B will always have an equivalent or greater number of rows for each
entity_id
Sample output
entity_id | difference |
---|---|
12345 | 100 |
3232 | 75 |
5992 | 40 |
and so on | for top 10/50 |
Advertisement
Answer
Aggregate in each table and join the results to get the difference:
SELECT a.entity_id, b.counter - a.counter diff FROM (SELECT entity_id, COUNT(*) counter FROM A GROUP BY entity_id) a INNER JOIN (SELECT entity_id, COUNT(*) counter FROM B GROUP BY entity_id) b ON a.entity_id = b.entity_id ORDER BY diff DESC LIMIT 10