Skip to content
Advertisement

How to reduce # of columns SQL has to look through while joining 2 tables?

I’m joining two tables together using an inner join, but given that these tables are billions of rows long, I was hoping to speed up my query and find a way to reduce the columns the sql has to comb through. Is there a way to, in a join, only have sql search through certain columns? I’m understand you can do it through SELECT, but I was hoping rather than select columns from the join, that I could reduce the # of columns being searched from.

Ex)

SELECT *
FROM table1 t1
JOIN table2 t2
  ON t1.suite = t2.suite
  AND t1.region = t2.region

Currently table1 and table2 both have over 20 columns, but I only need the 3 columns from each table. I’m using presto btw. Thanks and stay safe 🙂

Advertisement

Answer

If you create indexes on each table for both suite and region in the same index, plus an INCLUDES clause for any additional result columns you need, SQL Server can complete the query using only the indexes. This is called a covering index, and it will help performance for the query by increasing the number of “rows” (index entries) which fit in an 8Kb page verses an entire real row, which therefore also reduces the total number of page reads to complete the query.

Be aware, though, that you pay this cost by extra work at INSERT/UPDATE/DELETE time to keep the indexes up to date, extra storage needed for the indexes, and extra cache RAM use if any part of the indexes end up in the cache buffer. With potentially billions of index entries, that cost could be significant, and may outweigh the gains for this one query, or may require updates to your server capacity planning.

User contributions licensed under: CC BY-SA
9 People found this is helpful
Advertisement