Skip to content
Advertisement

How to parse quotes correctly?

I’m using ANTLR with Presto grammar in order to parse SQL queries. This is the definition of the string I’m using:

STRING
    : ''' ( ('\' ''') | ~''' | '''' )* '''
    ;

However, when I have a query like this:

select replace(name,''','')
FROM table1;        

it mess things up as it parses : ”’,’ as one string.

When I used the following rule instead:

STRING
    : ''' ( ('\' ''') | ~''')* '''
    ;

I didn’t parse correctly queries like:

SELECT * FROM table1 where col1 = 'nir''s'

which of course is a legal query.

Any idea how can I catch both?

Thanks, Nir.

Advertisement

Answer

If you want to support ', you should not only negate the single quote, but also negate the backslash.

Something like this:

STRING
    : ''' ( '\' '''   // match '
           | ~[\']      // match anything other than  and '
           | ''''      // match ''
           )* 
      '''
    ;

And to account for different escaped characters, do this:

STRING
    : ''' ( '\' ~[rn] // match  followed by any char other than a line break
           | ~[\']       // match anything other than  and '
           | ''''       // match ''
           )* 
      '''
    ;
User contributions licensed under: CC BY-SA
2 People found this is helpful
Advertisement