Skip to content
Advertisement

Selecting distinct values from a join of two large tables

I have an animals table with about 3 million records. The table has, among a few other columns, an id, name, and owner_id column. I have an animal_breeds table with about 2.5 million records. The table only has an animal_id and breed column.

I’m trying to find the distinct breed values that are associated with a specific owner_id, but the query is taking 20 seconds or so. Here’s the query:

The tables have all appropriate indices. I can’t denormalize the table by adding a breed column to the animals table because it is possible for animals to be assigned multiple breeds. I also have this problem with a few other large tables that have one-to-many relationships.

Is there a more performant way to achieve what I’m looking for? It seems like a pretty simple problem but I can’t seem to figure out the best way to achieve this other than pre-calculating and caching the results.

Here is the explain output from my query. Notice the Using temporary

And as requested, here are the create table statements (I left off a few unrelated columns and indices from the animals table). I believe the animal_breeds_animal_id_index index on animal_breeds table is redundant because of the unique key on the table, but we can ignore that for now as long as it’s not causing the problem 🙂

Any help would be appreciated. Thanks!

Advertisement

Answer

With knowledge about your data you can try something like this:

The idea is to get short list of distinct breeds without any filtering (for small list it would be quite fast) and then filter further the list with correlated subquery. As the list is short it would be only few subqueries executed and they will only check for existence that is much faster that any grouping (distinct == grouping).

This will only work if your distinct list is quite short.

With random generated data based on your answers the above query gave me the following execution plan:

Alternatively, you can try to create WHERE clause like this:

User contributions licensed under: CC BY-SA
8 People found this is helpful
Advertisement