I’m a self-taught, vaguely competent SQL user. For a view that I’m writing, I’m trying to develop a ‘conditional LEFT
‘ string-splitting command (presumably later to be joined by a ‘conditional RIGHT
‘ – whereby:
- If a string (let’s call it ‘haystack’) contains a particular pattern (let’s call it ‘needle’), it will be pruned to the left of that pattern
- Otherwise, the entire string will be passed unaltered.
So, if our pattern is ‘ – ‘,
- ‘A long string – containing the pattern’ will output as ‘A long string’
- ‘A string without the pattern’ will be returned as-is.
Rather than using the crudest ways to do this, I’m trying to come up with a way that avoids having to repeat any clause (such as if 0 < CHARINDEX
, then take CHARINDEX
– 1, etc.) and instead leverages conditional NULL
ing.
Yet – here’s what I get for trying to be creative – I’ve hit what seems to be a really basic stumbling block. Please observe the following code and results, and let me know whether you can replicate it – and hence whether it’s a bug or I’ve missed something peculiar. I have tested this on SQL Server both 2008 R2 and 2014, both Express editions.
select -- ISNULL: returns 'a big old string' ISNULL(null, 'a big old string'), -- NULLIF: returns NULL left( 'a big old string', nullif ( CHARINDEX ( 'needle', 'haystack' ), 0 ) - 1 ), -- combined: returns just 'a' (1st character of ISNULL condition) ISNULL( left ( 'a big old string', -- the input string. In reality, this would be a column alias, etc. nullif ( CHARINDEX -- Search for the splitting pattern ( 'needle', 'haystack' ), 0 -- If it's not found, return NULL instead of the usual 0 ) - 1 -- so that this subtraction produces a NULL, not an invalid negative index ), 'a big old string' -- If the pattern was not found, we should return the input unaltered ); /* ---------------- ---- ---- a big old string NULL a (1 row(s) affected) */
Why do these 2 clauses work as expected in isolation, but when I combine them, rather than getting the sum of their effects, I get only the 1st character of the ISNULL
string – ‘a’?
Is there some kind of implicit CAST
to varchar(1)
? Deliberately cast
ing to varchar(max)
made no difference. What else could be going on here?
Am I just doing something really stupid? Because from here, I can’t figure out what I’m doing wrong, and so it really seems like a bug. I hoped testing on 2014 would prove it to be a bug in the old 2008 R2, but alas, they act identically (or, rather, don’t).
Thanks in advance for, hopefully, saving me from what would presumably be an evening of baffled existential crisis.
Advertisement
Answer
There are two parts to this problem, the first is the nature of the ISNULL
operator, it will use the datatype and length of the first argument. A simple example would be:
DECLARE @A CHAR(1) = NULL, @B VARCHAR(MAX) = 'This is a test'; SELECT TOP 1 Test = ISNULL(@A, @B);
This returns T
and checking the execution plan XML we can see the implicit conversion of "This is a Test"
to CHAR(1)
:
<ScalarOperator ScalarString="isnull([@A],CONVERT_IMPLICIT(char(1),[@B],0))"> <Intrinsic FunctionName="isnull"> <ScalarOperator> <Identifier> <ColumnReference Column="@A" /> </Identifier> </ScalarOperator> <ScalarOperator> <Convert DataType="char" Length="1" Style="0" Implicit="true"> <ScalarOperator> <Identifier> <ColumnReference Column="@B" /> </Identifier> </ScalarOperator> </Convert> </ScalarOperator> </Intrinsic> </ScalarOperator>
Your example is not quite so straightforward since you don’t have your types nicely defined like above, but if we do define the dataypes:
DECLARE @A VARCHAR(MAX) = 'a big old string', @B VARCHAR(MAX) = 'needle', @C VARCHAR(MAX) = 'haystack'; SELECT TOP 1 ISNULL(LEFT(@A, NULLIF(CHARINDEX(@B, @C), 0) - 1), @A);
We get the result as expected. So something else is happening under the hood. The query plan does not delve into the inner workings of the constant evaluation, but the following demonstrates what is happening:
SELECT Test = LEFT('a big old string', NULLIF(CHARINDEX('needle', 'haystack'), 0) - 1) INTO #T; SELECT t.name, c.max_length FROM tempdb.sys.columns AS c INNER JOIN sys.types AS t ON t.system_type_id = c.system_type_id AND t.user_type_id = c.user_type_id WHERE [object_id] = OBJECT_ID(N'tempdb..#T'); ---------------- name max_length varchar 1
Basically, by using the SELECT INTO
sytax with your left expression shows that a when the NULL length is passed to LEFT
the resulting datatype is VARCHAR(1)
, however, this is not always the case. If I simply hard code NULL
into the LEFT
function:
SELECT Test = LEFT('a big old string', NULL) INTO #T; -------------------- name max_length varchar 16
Then you get the length of the string passed, but a case expression that should be optimised away to the same thing, yields a length of 1 again:
SELECT TOP 1 Test = LEFT('a big old string', CASE WHEN 1 = 1 THEN NULL ELSE 1 END) INTO #T; ---------------- name max_length varchar 1
I suspect it is related to the default behaviour of VARCHAR
, where the default length is 1, e.g:
DECLARE @A VARCHAR = 'This is a Test'; SELECT Value = @A, -- T MaxLength = SQL_VARIANT_PROPERTY(@A, 'MaxLength') -- 1
But I can’t tell you why you would see different behaviour for NULL
and CASE WHEN 1 = 1 THEN NULL ELSE 1 END
. If you wanted to get the bottom of what is going on in the constant evaluation I think you would probably need to re-ask on the DBA site and hope that one of the real SQL Server Gurus picks it up.
In summary, LEFT(<constant>, <constant expression>)
where <constant expression>
yields NULL
is implicitly typed as VARCHAR(1)
, and this implicit type is used in ISNULL
evaluation.
For what it is worth, if you explicitly type the result of your LEFT
function then you get the expected result:
SELECT ISNULL( CAST( LEFT( 'a big old string', NULLIF(CHARINDEX('needle', 'haystack'), 0) - 1 ) AS VARCHAR(MAX)) , 'a big old string');
An additional point is that when you say you don’t want to repeat any expressions (If 0 < CHARINDEX, then take CHARINDEX – 1, etc.), there are two things you should know, the first is that NULLIF(<expression>, <value>)
expands to a case expression – CASE WHEN <expression> = <value> THEN NULL ELSE <expression> END
, so is repeated, the second is that this doesn’t matter, SQL Server can identify that this is the same expression used twice, and will evaluate it once and refer to the same result each time it is used.