I am reading book T-SQL fundamental – 3rd edition, page 98 by Itzik Ben-Gan.
Solution 1:
SELECT orderid, orderdate, custid, empid FROM Sales.Orders WHERE YEAR(orderdate) = 2015 AND MONTH(orderdate) = 6;
Solution 2:
SELECT orderid, orderdate, custid, empid FROM Sales.Orders WHERE orderdate >= '20150106' AND orderdate <= '20150130';
Author said:
Apply manipulation to the filtered column, in most case SQL Server can’t use an index efficiently. Therefore, I advice using a range filter instead.
Please explain for me. Why date range filter is better than function YEAR()
and DATE()
?
Advertisement
Answer
Given a query such as:
SELECT orderid, orderdate, custid, empid FROM Sales.Orders WHERE . . .
The database engine essentially has two ways to process the query. One method is to cycle through all the rows in Orders
and check to see if each row matches the condition.
The second method — if possible — is to use an index to determine the specific rows that need to be fetched. In general, this is much faster, because the database engine only needs to read the rows that match the condition.
The issue is that this condition:
WHERE YEAR(orderdate) = 2015 AND MONTH(orderdate) = 6
cannot use the index. Actually, this is a statement about practice not theory. Theoretically, the engine could figure out that YEAR()
and MONTH()
are non-decreasing functions and make use of the index. No database does that. Instead, the database just says: “Oh, there is a function call. I can’t use the index.”
For this condition:
WHERE orderdate >= '20150106' AND orderdate <= '20150130'
The database can make use of an index. It can find the first row in the index that matches orderdate >= '20150106'
. It can then scan the index for the subsequent rows up to (and including) '20150130'
. In general, the database will decide that this is faster.
But, at the very least, eliminating the function calls gives the optimizer the opportunity to use an index for the condition.