I have some rows in my SQL table that contain single byte space characters, and some that contain double byte space characters. I need to identify rows with double byte characters to fix them in the user interface.
My first thought is to use charindex
and substring
to isolate the space from the rest of the text (SUBSTRING(@SomeField, charindex(' ', @SomeField),1))
and then use DataLength
to see if it was double byte or single byte, but it appears that’s not going to work. It appears it’s giving the same value for both single-byte and double-byte spaces.
SELECT DATALENGTH(' ') --double byte space , DATALENGTH(' ') --single byte space , LEN(' ') --double byte space , LEN(' ') --single byte space
Advertisement
Answer
That character U+3000 or select cast(0x0030 as nchar(1))
is apparantly called “Ideographic Space”.
You simply need to use unicode literals when using it in code. Both like
and charindex
work fine. But since it’s semantically just a space, most collations will not distinguish between U+0020 (a regular space) and U+3000. So you also need to use a binary collation (or have your database or column set to use one). EG
select 1 where N'原田 潮' like N'% %' collate SQL_Latin1_General_CP437_BIN2 select charindex(N' ' collate SQL_Latin1_General_CP437_BIN2, N'原田 潮') select 1 where N'原田 潮' like N'% %' collate SQL_Latin1_General_CP437_BIN2 select charindex(N' ' collate SQL_Latin1_General_CP437_BIN2 , N'原田 潮')
outputs
----------- 1 ----------- 3 ----------- ----------- 0