SQL query to count rows grouped by an ID, but limit count on each group

Question

So I have a bit of an unusual request. I'm working with a table with billions of rows. The table has a column 'id' which is not unique, and has a column 'data' What I want to do is run a count on ...

Accepted Answer

You can&#8217;t really optimize performance for this &#8212; I don&#8217;t think.select id, (count(*) >= 150)from tgroup by id;If you happen to have a separate table with one row per id and an index on t(id), then this might be faster:select ids.id,       ((select count(*)         from t         where t.id = ids.id        ) >= 150       )from ids;Unfortunately, MySQL does not support double nesting for correlated subqueries, so this is not possible:select ids.id,       ((select count(*)         from (select 1               from t               where t.id = ids.id               limit 150              ) t        ) >= 150       )from ids;If so, this might be faster.EDIT:If you have an index on id and only want ids that have 150 or more, then variables might be faster:select id,       (@rn := if(@id = id, @rn + 1,                  if(@id := id, 1, 1)                 )       ) as rnfrom (select id      from t      order by id     ) t cross join     (select @id := 0, @rn := 0) paramshaving rn = 150;The thinking here is that using the index to order the table, materializing, and scanning again is probably faster than group by.  I don&#8217;t think row_number() would have the same performance characteristics.EDIT II:A slight variation on the above can be used to get all ids with a flag:select id, (max(id) = 150)from (select id,             (@rn := if(@id = id, @rn + 1,                        if(@id := id, 1, 1)                       )             ) as rn      from (select id            from t            order by id           ) t cross join           (select @id := 0, @rn := 0) params      having rn in (1, 150)     ) tgroup by id;EDIT III:Ahh!  If you have a separate table of ids, then this might be the best approach:select ids.id,       (select id        from t        where t.id = ids.id        limit 1 offset 149       ) is not nullfrom ids;This will fetch the 150th row from the index.  If it not there, then no row is returned.

Advertisement

Answer