Skip to content
Advertisement

SQL query optimization with 3 joins

I am learning the act of query optimization, in terms of execution time. I have 3 tables which are posts, comments and users. The query below works perfect by returning the Users who have made more comments than posts in the year 2010 and their counts. I believe it can be optimized and I would appreciate an optimized code with an explanation.

Information on each Table

  • Users table; Has 40325 rows, with indexing on its primary key(Id).
  • Posts table; Has 91986 rows, with indexing on its primary key(PostId) and also the OwnerUserId which is the foreign key that references the users table.
  • Comments table – Has 174305 rows, with indexing on its primary key, UserId(for users table) and PostId(for posts table).

In my query above, there are two sub-queries. One for posts and the other for comments. I’d like to see how it can be optimized to have a lesser execution time.

The result of the above query, and an attachment of my EXPLAIN query: enter image description here enter image description here

Advertisement

Answer

One thing jumps out at me. Both your subqueries have this sort of line in them.

You call a function on a column value. That’s not sargeable. It prevents MySQL from being able to exploit an index on that column, and instead requires a full scan. (MySQL, and other DBMSs, are still too dumb to know that YEAR(timestamp) could be satisfied with an index range scan.)

So change those WHEREs to this kind of thing.

and put an index on the CreationDate columns in your posts and comments tables. Then the query planner can random-seek in the index to the first matching row, and then read it sequentially until the last matching row. This is called an index range scan, and is much more efficient than a full table scan.

EDIT You need the following indexes:

I suggest you refactor your query to make your subqueries, where all the work happens, faster.

These should be the subqueries. They each yield a number of items for each user, in the desired time range.

These queries save time by only aggregating (summarizing by group) the minimal amount of data needed to satisfy the query. And, they can be satisfied by doing a fast index range scan on the indexes I suggested.

Then, you can use these subqueries in your main query, picking up users’ names from your users table, like this.

I suspect you’ll get a big performance boost if you add the compound indexes I mentioned. You can drop the indexes on CreationDate; they are redundant when you add the compound indexes.

Here’s a worthwhile reference https://use-the-index-luke.com/

User contributions licensed under: CC BY-SA
9 People found this is helpful
Advertisement