Skip to content
Advertisement

How to search in SQL Server for text that has special characters?

I have a SQL Server table with a column of type TEXT that would store candidate resumes in different format. RTF is the most common one but often we get resume data from a 3rd party converter which stores the resume as special characters (maybe Unicode or I don’t know what they are).

  1. How do I search my table to find all the rows that have these special characters? For example the rows with id = 4,6,7, 9 etc. all are the records with special characters.

  2. What format are these special characters called? Unicode??

enter image description here

Advertisement

Answer

Assuming that by “special” characters you mean anything outside the set of printable ASCII and certain common whitespace characters , you can try the following:

DECLARE @SpecialPattern VARCHAR(100) =
    '%%[^'
    + CHAR(9) + CHAR(10) + CHAR(13) -- tab, CR, LF
    + CHAR(32) + '-' + CHAR(126) -- Range from space to last printable ASCII
    + ']%%'

SELECT
    RESUME_TEXT,
    cast(left(cast(resume_text as varchar(max)),20) as varbinary(max))` -- Borrowed from userMT's comment
FROM RESUME
WHERE RESUME_TEXT LIKE @SpecialPattern COLLATE Latin1_General_Bin -- Use exact compare

You may get some false hits against some perfectly valid extended characters such as accented vowels, curly quotes, or m- and n- dashes that may exist in the text.

My first though is that the weird characters might be a UTF-8 BOM (hex EF, BB, BF), but the display didn’t seem to match the how I would expect SQL Server to render them. The inverse dot isn’t present at all in the default windows code page (1252).

We need at least some hex data (at least the first few bytes) to help further. Often, common binary file types have a recognizable signature in the first 3-5 bytes.

User contributions licensed under: CC BY-SA
10 People found this is helpful
Advertisement