SQL 2017 Standard i have a database in a star schema model (data warehouse) to fill in a fact table, i have a stored procedure with a temporary table having around 470,0000 rows. to fill in the dimension ids, i have a left join operation between the temporary table and the dimensions tables. for ex:
insert into factTable (...fields list...) select t.Quantity1,t.Quantity2,d1.ID,d.ID,...,19.id from MyTemp t left outer join dim1 d1 on t.F1=d1.F1 and t.CompanyID=d1.CompanyID and t.DataSourceID=d1.DataSourceID left outer join dim2 d2 on t.F2=d2.F2 and t.CompanyID=d2.CompanyID and t.DataSourceID=d2.DataSourceID left outer join dim3 d3 on t.F3=d3.F3 and t.CompanyID=d2.CompanyID and t.DataSourceID=d2.DataSourceID ....... left outer join dim19 d19 on t.F19=3.F19 and t.CompanyID=d19.CompanyID and t.DataSourceID=d19.DataSourceID
the problem is when using a smaller number of joins, for ex just for the first 5 or 6 dimensions, the query is very fast.
with 19 joins, it is taking more than 4 hours.
the execution plan shows that the bottleneck comes from a hidden “Sort” operation inserted by the query optimizer!!!!
when reading data from each dimension, sql server is sorting the dimension data before joining with the temporary table. introducing indexes on the temporary table did not solve the problem. even limiting the query to the first field from an index on the temp table did not help
Advertisement
Answer
Using
CREATE CLUSTERED COLUMNSTORE INDEX
on the temporary table solved the problem. Thank you all. You can check CLUSTERED COLUMNSTORE INDEX for more details (if you are working on SQL Server 2016 Standard edition you will need SP2)