I have a repository of SQL queries and I want to understand which queries use certain tables or fields.
Let’s say I want to understand what queries use the email field, how can I write it?
Example SQL query:
select users.email as email_user ,users.email as email_user_too ,email as email_user_too_2 email as email_user_too_3, back_email as wrong_email -- wrong field from users
Advertisement
Answer
So to state the problem more accurately, you are sorting through a list of SQL queries [as text], and you now need to find the queries that use certain fields using SQL & RegEx
(Regular Expressions) in PostgreSQL
. (please tag the question so that StackOverflow indexes your question correctly, more importantly, readers have more context about the question)
PostgreSQL has Regular Expression support OOTB (Out Of The Box). So we skip exploring other ways to do this. (If you are reading this as Microsoft SQL Server person, then I strongly suggest you to have a read of this brilliant article on Microsoft’s website on defining a Table-Valued UDF (User Defined Function))
The simplest way I could think of to approach your problem, is to throw away what we don’t want out of the query text first, and then filter out what’s left.
This way, after throwing away the stuff you don’t need, you will be left with a set of “tokens” that you can easily filter, and I’m putting token in quotes since we are not really parsing the SQL language, but if we did that would be the first step: to extract tokens.. (:
Take this query for example:
With Queries ( Id , QueryText ) As ( values (1, 'select users.email as email_user ,users.email as email_user_too ,email as email_user_too_2, email as email_user_too_3, back_email as wrong_email -- wrong field from users') ) Select QueryText , found From ( Select Id , QueryText , regexp_split_to_table (QueryText, '(--[sw]+|select|from|as|where|[ sn,])') As found From Queries ) As Result Where found != '' And found = 'back_email'
- I have sourced the concept of a “query repository” with a
WITH
statement for ease of doing the pseudo-code. - I have also selected few words/characters to split
QueryText
with. Likeselect
,where
etc. We don’t need these in our ‘found’ set. - And in the end, as you can see above, I simply used
found
as what’s left and filtered it with the field name you are looking for. (Assuming that you know the field you are looking for)
You could improve upon the RegEx I did, or change the method as you wish to make it better. But I think the general concept addresses what you need to achieve. One problem I can see with my solution right off the bat is the fact that you can search for anything really, not just names of the selected fields – which begs the question, why use RegEx, and not Like
statements? But again, as I mentioned, you can improve upon the RegEx and address specific requirements you may have. Using Like
might limit you in that direction. (In other words, only you know what’s good for you. I can’t say that from here.)
You can play with the query online here: db-fiddle query and use https://regex101.com/ for testing your RegEx.
Disclaimer I’m not a PostgreSQL developer. There must be other, perhaps better ways of doing this. (: