Skip to content
Advertisement

Speed up joins on thousands of rows

I have two tables that look something along the lines of:

t1 (id INT PRIMARY KEY, data VARCHAR(10), fkey INT, INDEX i1(data))
t2 (id INT PRIMARY KEY, enabled TINYINT)

And a query to get data from t1. The query runs fine and takes a few ms to get data from it, although problems start appearing when I use joins. Here’s an example of one of my queries:

SELECT `id` FROM `t1` WHERE `data`='something';

Which can return a few thousand rows, that might look something like:

1, something, 5
2, something, 5
3, something, 5
4, something, 6
5, something, 5

Out of these thousands of rows, there are probably only going to be one or two distinct values of fkey. Now, here’s my query with the join:

SELECT `t1`.`id` FROM `t1` 
INNER JOIN `t2` 
   ON `t2`.`id`=`t1`.`fkey` 
   AND `t2`.`enabled`=1 
WHERE `t1`.`data`='something';

And here’s the execution plan for it:

*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: t2
   partitions: NULL
         type: index
possible_keys: id
          key: id
      key_len: 75
          ref: NULL
         rows: 378
     filtered: 100.00
        Extra: Using where; Using index
*************************** 2. row ***************************
           id: 1
  select_type: SIMPLE
        table: t1
   partitions: NULL
         type: ref
possible_keys: i1
          key: i1
      key_len: 150
          ref: DB.t2.id,const
         rows: 13
     filtered: 100.00
        Extra: Using index

This query ends up taking roughly 60 seconds, compared to the few ms of the other query. I’ve also tried removing the enabled check from the query, but it has no impact on performance. Also, querying t2 directly by id only takes a few ms as well.

It seems to me like the database should only have a need to run the join on one or two rows and just cache the results for any rows with the same fkey, but that doesn’t seem to be happening. Is there any way to achieve this behavior?

Advertisement

Answer

After a lot of searching, I’ve finally come across a good solution for this that significantly reduces the time it takes to run these queries.

This query:

SELECT `t1`.`id` FROM `t1` 
INNER JOIN `t2` 
   ON `t2`.`id`=`t1`.`fkey` 
   AND `t2`.`enabled`=1 
WHERE `t1`.`data`='something';

Can be rewritten as:

SELECT `t1`.`id` FROM `t1`
WHERE `t1`.`data`='something' 
AND `t1`.`fkey` IN (SELECT `id` FROM `t2` WHERE `t2`.`enabled`=1);

This will query and compare on all values from t2, which isn’t ideal if it only needs one or two, but it’s better than nothing.

If someone has a solution that does do the comparison on only a few and it isn’t some MySQL hack then I’ll mark it as accepted, but for now, this works pretty well.

EDIT: As @Joel Coehoorn and @eshirvana pointed out, indexing fkey in t1 is helpful, but only really has an affect on queries 8k+ rows.

User contributions licensed under: CC BY-SA
3 People found this is helpful
Advertisement