I have a huge codebase that has a lot of JAVA and .sql files. I intend to extract all the SQL statements from all these files.
Here is the way I intend to achieve this –
- Build a regex file containing patterns like select, insert, delete, update etc that I intend to extract.
- Parse files line by line in code base and match against Regex. If match is found, register the code line, file name, pattern etc.
The problem comes when the SQL queries are broken into multiple lines.
I am using java.util.regex.Matcher and java.util.regex.Pattern methods to build/match regex patterns when I read lines using Buffered Reader.
pattern = Pattern.compile(regexString,Pattern.CASE_INSENSITIVE); . . matcher = pattern.matcher(lineBuffer.readline().trim()); if( matcher.find()){ //Do something }
For Multiline statements, I should look for the line terminator “;” – if not found, I can read next line and append the output in matched string so that it is understood as a single query.
while(!lineString.endsWith(";")) { lineString = lineString + lineBfr.readLine().trim(); }
Is there a better way to achieve a solution to the requirement here using Pattern.MULTILINE or using StreamBuffers where I read the entire file into a single buffer and process further?
Advertisement
Answer
You could take advantage of Apache Commons, and their great method FileUtils#readFileToString(File file).
This combined with the Pattern.MULTILINE
you mentioned seems like a potentially very simple solution.