Skip to content
Advertisement

MSSQL cast( [varcharColumn] to int) in SELECT gets executed before WHERE clause filters out bad values

Assume the following schema and query:

Please look past the glaring design issues with having values in a varchar column that we expect to be ints.

create table dbo.Parent (
    Id bigint NOT NULL,
    TypeId int NOT NULL
)

create table dbo.Child (
    Id bigint NOT NULL,
    ParentId bigint NOT NULL,
    TypeId int NOT NULL,
    varcharColumn varchar(300) NULL
)

select cast(c.varcharColumn as int)
from dbo.Parent p (nolock)
    inner join dbo.Child c (nolock)
        on p.Id = c.ParentId
            and c.TypeId = 2
where p.TypeId = 13

The break:

We get a cast break due to a value that cannot be converted to an int. In this case: “123-1”. The strange thing is that the value being cast gets filtered out of the final result set.

For example, this returns zero results

select c.varcharColumn
from dbo.Parent p (nolock)
    inner join dbo.Child c (nolock)
        on p.Id = c.ParentId
            and c.TypeId = 2
where p.TypeId = 13
    and c.varcharColumn = '123-1'

The query plan ends up looking at the Child table and actually applying the cast function before the where clause.

We were able to fix this by creating a new index on the child table (it was doing a PK scan)

create index [NCIDX_dbo_Child__TypeId] on dbo.Child (
    TypeId
)
include (
    ParentId,
    varcharColumn
)

It now filters on the parent table’s where clause first.

Is there any way to fix this without the extra index? And again, please refrain from any suggestions related to fixing our schema. That is definitely the proper fix in this case.

I’m mostly interested in understanding why it applied the cast before it filtered the result set.

Thanks

Edit – Answer:

Many thanks to both Aaron and Gordon. If I ever get more than 15 rep points, I’ll come back and up both of your replies.

We ended up needing Gordon’s answer since we wanted to use this query in a view. A few folks at the office were wary of using a case statement because they prefer to have more control over ensuring that we have a smaller result set first (Aaron’s answer), however it all boils down to looking at the query plan and checking your read counts.

Again, thanks for all of the responses!

Advertisement

Answer

First, this is not a “glaring design issue”. SQL is a descriptive language of the output, not a procedural language that specifies how prcoessing is being done. There is no guarantee of the order of processing, in general, and this is an advantage. I might say there is a design issue, but it is around the general handling of exceptions in SQL statements.

According to SQL Server documentation (http://msdn.microsoft.com/en-us/library/ms181765.aspx), you can depend on the order of evauation for a CASE statement for scalar expressions. So, the following should work:

select (case when isnumeric(c.varcharColumn) = 1 then cast(c.varcharColumn as int) end)

Or, to get closer to an “int” expression:

select (case when isnumeric(c.varcharColumn) = 1 and c.varcharColumn not like '%.%' and c.varcharColumn not like '%e%'
             then cast(c.varcharColumn as int)
        end)

At least your code is doing an explicit CAST. This situation is much nastier when the casts are implicit (and there are hundreds of columns).

User contributions licensed under: CC BY-SA
9 People found this is helpful
Advertisement