I have two MySQL tables A and B both with this schema
| ID | entity_id | asset | asset_type |
|---|---|---|---|
| 0 | 12345 | x | 1 |
| .. | ……… | ….. | ………. |
I would like to get an aggregated top 10/50/whatever entity_ids with the largest row count difference between the two tables. I think I could do this manually by just getting the highest row count by entity_id like so
select count(*), entity_id -> from A -> group by entity_id -> order by count(*) desc;
and just manually comparing to the same query for table B but I’m wondering if there’s a way to do this in just one query, that compares row counts for each distinct entity_id and aggregates the differences between row counts. A few notes
- There is an index on
entity_idfor both tables - Table B will always have an equivalent or greater number of rows for each
entity_id
Sample output
| entity_id | difference |
|---|---|
| 12345 | 100 |
| 3232 | 75 |
| 5992 | 40 |
| and so on | for top 10/50 |
Advertisement
Answer
Aggregate in each table and join the results to get the difference:
SELECT a.entity_id, b.counter - a.counter diff FROM (SELECT entity_id, COUNT(*) counter FROM A GROUP BY entity_id) a INNER JOIN (SELECT entity_id, COUNT(*) counter FROM B GROUP BY entity_id) b ON a.entity_id = b.entity_id ORDER BY diff DESC LIMIT 10