Skip to content
Advertisement

COUNT with GROUP BY based on most recent rows only

I have a table named user_teams which has the following columns:

I want to have a query that given a set of team ids, it gives me the count of users that currently belong to that team. The currently part is important and that’s why the effective_date exists. For example, if user with id 1 first belonged to team with id 1, and later belonged to team with id 2, there would be two rows:

And this user should just be counted for the team with id 2. I have the following query that accomplishes what I’m looking for, but I don’t like it because the subselect will basically go through the whole table:

I can’t move the WHERE inside the subselect, because if for example I were to use the previous example and I run the query for the team_id 1:

It would consider the user that at one point it was in team 1, but it no longer is.

Is there another way to make this query so that my subselect doesn’t go through all the user_teams table?

Advertisement

Answer

If you want to avoid looking at every user in the table, follow these steps:

  1. Find all users that play or played in the requested teams.
  2. Find those users’ latest entries.
  3. Determine those entries’ teams.
  4. Only keep desired teams and count.

The query:

Indexes:

Anyway, working this way makes sense when you have, say, 10000 users with their team history in the table, but a team has just five or ten users. This means working on a small subset of the table data. Once the ratio is less extreme, it may be quicker to simply go through the whole table, i.e. use your own query. This could still benefit from the second index, as it contains all data in the appropriate order (per user -> highest date -> team).

User contributions licensed under: CC BY-SA
8 People found this is helpful
Advertisement