Objective : Add an additional WHERE
clause to any given Clickhouse statement.
I’m using the following Antlr grammars to generate Java classes for a lexer & parser.
Lexer grammar
https://github.com/ClickHouse/ClickHouse/blob/master/utils/antlr/ClickHouseLexer.g4
Parser grammar
https://github.com/ClickHouse/ClickHouse/blob/master/utils/antlr/ClickHouseParser.g4
Problem : I cannot figure out/understand how to interact or create the appropriate POJOs for use with the generated classes that Antlr produces.
Example of statement
String query = "INSERT INTO t VALUES (1, 'Hello, world'), (2, 'abc'), (3, 'def')"
Goal of SQL (enrichment code)
String enrichedQuery = SqlParser.enrich(query);
System.out.println(enrichedQuery);
//Output
>> INSERT INTO t VALUES (1, 'Hello, world'), (2, 'abc'), (3, 'def') (WHERE X IN USERS)
I have the follow Java main
public class Hello {
public static void main( String[] args) throws Exception{
String query = "INSERT INTO t VALUES (1, 'Hello, world'), (2, 'abc'), (3, 'def')"
ClickhouseLexer = new ClickhouseLexer(new ANTLRInputStream(query));
CommonTokenStream tokens = new CommonTokenStream(lexer);
ClickHouseParser = new ClickHouseParser (tokens);
ParseTreeWalker walker = new ParseTreeWalker();
}
}
Advertisement
Answer
I’d suggest taking a look at TokenStreamRewriter.
First, let’s get the grammars ready.
1 – with TokenStreamRewriter
we’ll want to preserve whitespace, so let’s change the -> skip
directives to ->channel(HIDDEN)
At the end of the Lexer grammar:
// Comments and whitespace
MULTI_LINE_COMMENT: '/*' .*? '*/' -> channel(HIDDEN);
SINGLE_LINE_COMMENT: '--' ~('n'|'r')* ('n' | 'r' | EOF) -> channel(HIDDEN);
WHITESPACE: [ u000Bu000Ctrn] -> channel(HIDDEN); // 'n' can be part of multiline single query
2 – The C++ specific stuff just guards against using keywords more than once. You don’t really need that check for your purposes (and it could be done in a post-parse Listener if you DID need it). So let’s just lose the language specific stuff:
engineClause: engineExpr (
orderByClause
| partitionByClause
| primaryKeyClause
| sampleByClause
| ttlClause
| settingsClause
)*
;
and
dictionaryAttrDfnt
: identifier columnTypeExpr (
DEFAULT literal
| EXPRESSION columnExpr
| HIERARCHICAL
| INJECTIVE
| IS_OBJECT_ID
)*
;
dictionaryEngineClause
: dictionaryPrimaryKeyClause? (
sourceClause
| lifetimeClause
| layoutClause
| rangeClause
| dictionarySettingsClause
)*
;
NOTE: There seems to be an issue with the grammar not accepting the actual values for an insert statement:
insertStmt
: INSERT INTO TABLE? (
tableIdentifier
| FUNCTION tableFunctionExpr
) columnsClause? dataClause
;
columnsClause
: LPAREN nestedIdentifier (COMMA nestedIdentifier)* RPAREN
;
dataClause
: FORMAT identifier # DataClauseFormat
| VALUES # DataClauseValues // <- problem on this line
| selectUnionStmt SEMICOLON? EOF # DataClauseSelect
;
(I’m not going to try to fix that part, so I’ve commented your input to accommodate)
(It would also help if the top level rule needed with an EOF
token; without that ANTLR just stops parsing after VALUE
. An EOF at the end of a root rule is considered a best practice for exactly this reason.)
The Main program:
import org.antlr.v4.runtime.CharStream;
import org.antlr.v4.runtime.CharStreams;
import org.antlr.v4.runtime.CommonTokenStream;
import org.antlr.v4.runtime.TokenStreamRewriter;
import org.antlr.v4.runtime.tree.ParseTreeWalker;
public class TSWDemo {
public static void main(String args) {
new TSWDemo().run(CharStreams.fromString("INSERT INTO t VALUES /* (1, 'Hello, world'), (2, 'abc'), (3, 'def') */"));
}
public void run(CharStream charStream) {
var lexer = new ClickHouseLexer(charStream);
var tokenStream = new CommonTokenStream(lexer);
var parser = new ClickHouseParser(tokenStream);
var tsw = new TokenStreamRewriter(tokenStream);
var listener = new TSWDemoListener(tsw);
var queryStmt = parser.queryStmt();
ParseTreeWalker.DEFAULT.walk(listener, queryStmt);
System.out.println(tsw.getText());
}
}
The Listener:
import org.antlr.v4.runtime.TokenStreamRewriter;
public class TSWDemoListener extends ClickHouseParserBaseListener {
private TokenStreamRewriter tsw;
public TSWDemoListener(TokenStreamRewriter tsw) {
this.tsw = tsw;
}
@Override
public void exitInsertStmt(ClickHouseParser.InsertStmtContext ctx) {
tsw.insertAfter(ctx.getStop(), " (WHERE X IN USERS)");
}
}
Output:
INSERT INTO t VALUES (WHERE X IN USERS) /* (1, 'Hello, world'), (2, 'abc'), (3, 'def') */