I’m joining two tables together using an inner join, but given that these tables are billions of rows long, I was hoping to speed up my query and find a way to reduce the columns the sql has to comb through. Is there a way to, in a join, only have sql search through certain columns? I’m understand you can do it through SELECT, but I was hoping rather than select columns from the join, that I could reduce the # of columns being searched from.
Ex)
SELECT * FROM table1 t1 JOIN table2 t2 ON t1.suite = t2.suite AND t1.region = t2.region
Currently table1 and table2 both have over 20 columns, but I only need the 3 columns from each table. I’m using presto btw. Thanks and stay safe 🙂
Advertisement
Answer
If you create indexes on each table for both suite
and region
in the same index, plus an INCLUDES
clause for any additional result columns you need, SQL Server can complete the query using only the indexes. This is called a covering index, and it will help performance for the query by increasing the number of “rows” (index entries) which fit in an 8Kb page verses an entire real row, which therefore also reduces the total number of page reads to complete the query.
Be aware, though, that you pay this cost by extra work at INSERT/UPDATE/DELETE time to keep the indexes up to date, extra storage needed for the indexes, and extra cache RAM use if any part of the indexes end up in the cache buffer. With potentially billions of index entries, that cost could be significant, and may outweigh the gains for this one query, or may require updates to your server capacity planning.