Background
Hello all!
I recently learned that in newer versions of SQL Server, the query optimizer can “expand” a SQL view and utilize inline performance benefits. This could have some drastic effects going forward on what kinds of database objects I create and why and when I create them, depending upon when this enhanced performance is achieved and when it is not.
For instance, I would not bother creating a parameterized inline table-valued function with a start date parameter and an end date parameter for an extremely large transaction table (where performance matters greatly) when I can just make a view and slap a WHERE
statement at the bottom of the calling query, something like
SELECT Column1 FROM vw_Simple WHERE Column1 BETWEEN @SomeStartDate AND @SomeEndDate
and trust that the query optimizer will “expand” the view and give me a great execution plan.
Note: I am talking about a simple, non-nested, non-indexed SQL Server view. Something like
CREATE VIEW vw_Simple AS SELECT Column1 ,Column2 FROM TableA
Question
My question is: what are the exact guidelines for knowing when the query optimizer can “expand” a SQL view and when it cannot?
I cannot find this answer in the official Microsoft documentation.
What I’ve found so far
Cases where the query optimizer can expand a view:
- This Stack Exchange post claims that, in general, the query optimizer will expand a SQL view.
Cases where the query optimizer cannot expand a view:
- Predicate Pushing/Deferred Filtering – but this was addressed in SQL Server 2008.
- Nested Views – at least sometimes.
Gray Area
- This Stack Overflow post’s second answer claims that the query optimizer may or may not expand the view, depending on the view’s complexity and the limitations of the query optimizer.
Advertisement
Answer
You will not find this information in the documentation, because it is not a single feature per se, it is simply the compiler/optimizer working its way through the query in various phases, using a number of different techniques to get the best execution plan. Sometimes it can safely push through predicates, sometimes it can’t.
Note that “expanding the view” is the wrong term here. The view is always expanded into its definition (NOEXPAND
excepted). What you are referring to is called predicate pushdown.
What happens to a view during compilation?
I’ve assumed here that indexed views and
NOEXPAND
are not being used.
When you execute a query, the compiler starts by parsing and lexing the query into a basic execution plan. This is a very rough, unoptimized version which pretty much mirrors the query as written.
When there is a view in the query, the compiler will retrieve the view’s pre-parsed execution tree and shoves it into the execution plan, again it is a very rough draft.
With derived tables, CTEs, correlated and non-correlated subqueries, as well as inline TVFs, the same thing happens, except that parsing is needed also.
After this point, you can assume that a view may as well have been written as a CTE, it makes no difference.
Can the optimizer push through the view?
The compiler has a number of tricks up its sleeve, and predicate pushdown is one of them, as is simplifying views.
The ability of the compiler here is mainly dependent on whether it can deduce that a simplification is permitted, not that it is possible.
For example, this query
SELECT SomeCol FROM ( SELECT TOP 100 PERCENT * FROM ( SELECT SomeCol, OtherCol, 1 / 0 AS ThisDoesntError FROM table1 ) t WHERE OtherCol = 1 ORDER BY ThisDoesntError ) t WHERE OtherCol <> 2
is fairly trivial to optimize away to this
SELECT SomeCol FROM table1 WHERE OtherCol = 1
because TOP 100 PERCENT... ORDER BY...
is known to be something that will make no difference to the outer query, and can therefore be dropped, followed by the whole ThisDoesntError
column.
So when does it not work?
The problems start when the optimizer cannot push through the view, because it may change the semantics (and therefore the results) of the query.
SELECT SomeCol FROM ( SELECT TOP 10 * FROM ( SELECT SomeCol, OtherCol, 1 / 0 AS ThisDOESError FROM table1 ) t ORDER BY ThisDOESError ) t WHERE OtherCol = 1
Because the TOP
needs to be calculated based on the ORDER BY ThisDOESError
clause, the column ThisDOESError
cannot be elided, and the filter on OtherCol
cannot be pushed through.
Likewise this one cannot be optimized either
SELECT SomeCol FROM ( SELECT SomeCol, OtherCol, ROW_NUMBER() OVER (PARTITION BY SomeCol ORDER BY ThirdCol) AS rn FROM table1 ) t WHERE rn = 1 AND OtherCol = 1
In this case, because the row-number must be calculated on the whole set, the filter OtherCol = 5
cannot be safely pushed through.
Interestingly enough, this version should be able to be safely pushed through (no promises though!)
SELECT SomeCol FROM ( SELECT SomeCol, OtherCol, ROW_NUMBER() OVER (PARTITION BY SomeCol ORDER BY ThirdCol) AS rn FROM table1 ) t WHERE rn = 1 AND SomeCol = 'Something'
In this case, the optimizer should in theory be able to see that the filtering column is also the partitioning column, and therefore the row-number calculation will not change.