Skip to content
Advertisement

How to parse a Clickhouse-SQL statement using ANTRL4?

Objective : Add an additional WHERE clause to any given Clickhouse statement.

I’m using the following Antlr grammars to generate Java classes for a lexer & parser.

Lexer grammar

https://github.com/ClickHouse/ClickHouse/blob/master/utils/antlr/ClickHouseLexer.g4

Parser grammar

https://github.com/ClickHouse/ClickHouse/blob/master/utils/antlr/ClickHouseParser.g4

Problem : I cannot figure out/understand how to interact or create the appropriate POJOs for use with the generated classes that Antlr produces.

Example of statement

Goal of SQL (enrichment code)

I have the follow Java main

Advertisement

Answer

I’d suggest taking a look at TokenStreamRewriter.

First, let’s get the grammars ready.

1 – with TokenStreamRewriter we’ll want to preserve whitespace, so let’s change the -> skip directives to ->channel(HIDDEN)

At the end of the Lexer grammar:

2 – The C++ specific stuff just guards against using keywords more than once. You don’t really need that check for your purposes (and it could be done in a post-parse Listener if you DID need it). So let’s just lose the language specific stuff:

and

NOTE: There seems to be an issue with the grammar not accepting the actual values for an insert statement:

(I’m not going to try to fix that part, so I’ve commented your input to accommodate)

(It would also help if the top level rule needed with an EOF token; without that ANTLR just stops parsing after VALUE. An EOF at the end of a root rule is considered a best practice for exactly this reason.)

The Main program:

The Listener:

Output:

User contributions licensed under: CC BY-SA
8 People found this is helpful
Advertisement