Skip to content
Advertisement

SQL Server: NULL or empty string?

I have a a database column whose value is determined programmatically and is not affected by user input. The value is never an empty string; it’s either NULL or a non-0 length string. The column is currently nullable.

Often, I want to compare this column between two different rows. Because the column is nullable I must do the equivalent of:

SELECT *
FROM
    myTable A
    INNER JOIN myTable B ON ISNULL(B.someColumn, '') = ISNULL(A.someColumn, '')
WHERE   B.Id > A.Id

If an empty string has no meaning, would I be wiser to make the column NOT NULL and use an empty string to indicate the column has no value?

Then I could compare columns using:

SELECT *
FROM
    foo A
    INNER JOIN foo B ON B.someColumn = A.someColumn
WHERE   B.Id > A.Id

I seem this method makes better use of indices on column someColumn?

And yes, I know I could do:

(B.someColumn = A.someColumn) or (B.someColumn is NULL and A.someColumn is NULL)

But that seems a bit wordy.

Advertisement

Answer

What you have already stated yourself and the code you have given is correct.

Notably, the code from your first example,

`INNER JOIN myTable B ON ISNULL(B.someColumn, '') = ISNULL(A.someColumn, '')`

forces SQL server to scan the whole tables and to compute ISNULL(...) before being able to do the join. This means that it can’t use any index to speed up the join, which will drastically decrease performance. The same procedure will be repeated when executing that query the next time, so you can’t expect that subsequently running the query again will be faster.

[ For the record, of course you could make ISNULL(...) a computed column, index that and use it in the joins, but that seems a little exaggerated given the other possible solutions. ]

Whether you should use '' instead of NULL or rather make another approach depends of the effort the former requires:

If your application touches that column only if it wants to write a meaningful (read: non-null) value there, then you could solve the problem in three easy steps (update the column and turn all NULL values into ''; make the column non-nullable; change the column’s default value from NULL to '').

But if your application touches that column in every case and actively writes NULL to it if appropriate, you would have to change the application itself to take that route. Whether changing the application is possible is known only to you …

If you can’t change the application, your third approach is good and works reliably:

(B.someColumn = A.someColumn) or (B.someColumn is NULL and A.someColumn is NULL)

It wouldn’t disturb me that it is “wordy”. The length of code in general does not mean anything with respect to performance, and in this case, the wordy code enables you to leave things as-is in your application (except this query), while enabling SQL server to use indexes to speed up the join.

In summary:

Use '' instead of NULL if your application allows that easily, or can be changed easily to allow it. Otherwise, use your third approach.

A final note regarding ANSI_NULLS:

Walter Vehoeven’s comment below your question is correct, but in your case, ANSI_NULLS would not change anything. From the documentation (first paragraph below the table, formatting mine):

SET ANSI_NULLS ON affects a comparison only if one of the operands of the comparison is either a variable that is NULL or a literal NULL. If both sides of the comparison are columns or compound expressions, the setting does not affect the comparison.

The second sentence means that it won’t have any effect in your case, because you are comparing / joining two columns (even if it actually is the same column on both sides of the comparison).

User contributions licensed under: CC BY-SA
5 People found this is helpful
Advertisement