Skip to content
Advertisement

Finding double-byte white spaces in SQL table

I have some rows in my SQL table that contain single byte space characters, and some that contain double byte space characters. I need to identify rows with double byte characters to fix them in the user interface.

My first thought is to use charindex and substring to isolate the space from the rest of the text (SUBSTRING(@SomeField, charindex(' ', @SomeField),1)) and then use DataLength to see if it was double byte or single byte, but it appears that’s not going to work. It appears it’s giving the same value for both single-byte and double-byte spaces.

SELECT  DATALENGTH(' ') --double byte space
, DATALENGTH(' ') --single byte space
, LEN(' ') --double byte space
, LEN(' ') --single byte space

Advertisement

Answer

That character U+3000 or select cast(0x0030 as nchar(1)) is apparantly called “Ideographic Space”.

You simply need to use unicode literals when using it in code. Both like and charindex work fine. But since it’s semantically just a space, most collations will not distinguish between U+0020 (a regular space) and U+3000. So you also need to use a binary collation (or have your database or column set to use one). EG

select 1 where N'原田 潮' like N'% %' collate SQL_Latin1_General_CP437_BIN2

select charindex(N' ' collate SQL_Latin1_General_CP437_BIN2, N'原田 潮') 


select 1 where N'原田 潮' like N'% %' collate SQL_Latin1_General_CP437_BIN2

select charindex(N' ' collate SQL_Latin1_General_CP437_BIN2 , N'原田 潮')

outputs

-----------
1


-----------
3


-----------


-----------
0
User contributions licensed under: CC BY-SA
3 People found this is helpful
Advertisement