I’m writing a lexer for SQL for syntax coloring and have the problem of parsing code like
SELECT * FROM table t WHERE t.name LIKE 'FOO_%' ESCAPE '';
Is my understanding correct, that escapes some characters like underscore (
_
), but not if it is followed by a single quote ('
)? Does the escaping depends on the SQL dialect or is the behavior the same for all (or a lot of) dialects?
Is it correct, that ''
usually would mean an incomplete string literal, because the backslash escapes the single quote?
Advertisement
Answer
By default the doesn’t really have any special meaning in a string/character constant in SQL. Single quotes are “escaped” by doubling them in SQL e.g.
'Arthur''s house'
The only situation where it does have a special meaning is the ESCAPE
clause. The default escape character to escape wildcards for SQL LIKE is the but it can be changed through the ESCAPE clause.
So the following conditions are identical:
t.name LIKE 'FOO_%' t.name LIKE 'FOO_%' ESCAPE '' t.name LIKE 'FOO#_%' ESCAPE '#'
But for strings that are not used for a LIKE condition, the backslash has no special meaning in “SQL”, i.e. in ANSI standard SQL.
Some database products however do support non-standard escaping of characters using inside regular strings. Postgres used to do that, but this was deprecated a long time, and the default behaviour today is, that a
does not escape anything in a string constant, so
't'
is simply a backslash character followed by a t
. It can be configured to accepe non-standard strings where that would result in the “tab character” instead.
I think MySQL always uses this non-standard behaviour of a backslash inside a string constant.