Regexp search SQL query fields

Tags: , ,



I have a repository of SQL queries and I want to understand which queries use certain tables or fields.

Let’s say I want to understand what queries use the email field, how can I write it?

Example SQL query:

select
    users.email as email_user
    ,users.email as email_user_too
    ,email as email_user_too_2
    email as email_user_too_3,
    back_email as wrong_email -- wrong field
from users

Answer

So to state the problem more accurately, you are sorting through a list of SQL queries [as text], and you now need to find the queries that use certain fields using SQL & RegEx (Regular Expressions) in PostgreSQL. (please tag the question so that StackOverflow indexes your question correctly, more importantly, readers have more context about the question)

PostgreSQL has Regular Expression support OOTB (Out Of The Box). So we skip exploring other ways to do this. (If you are reading this as Microsoft SQL Server person, then I strongly suggest you to have a read of this brilliant article on Microsoft’s website on defining a Table-Valued UDF (User Defined Function))

The simplest way I could think of to approach your problem, is to throw away what we don’t want out of the query text first, and then filter out what’s left.

This way, after throwing away the stuff you don’t need, you will be left with a set of “tokens” that you can easily filter, and I’m putting token in quotes since we are not really parsing the SQL language, but if we did that would be the first step: to extract tokens.. (:

Take this query for example:

With Queries (
    Id
,   QueryText
) As (
values (1, 'select
    users.email as email_user
    ,users.email as email_user_too
    ,email as email_user_too_2,
    email as email_user_too_3,
    back_email as wrong_email -- wrong field
from users')
)
Select  QueryText
    ,   found
    From    (
        Select  Id
            ,   QueryText
            ,   regexp_split_to_table (QueryText, '(--[sw]+|select|from|as|where|[ sn,])') As found
            From    Queries
    )   As  Result
    Where   found   !=  ''
    And     found   =   'back_email'
  • I have sourced the concept of a “query repository” with a WITH statement for ease of doing the pseudo-code.
  • I have also selected few words/characters to split QueryText with. Like select, where etc. We don’t need these in our ‘found’ set.
  • And in the end, as you can see above, I simply used found as what’s left and filtered it with the field name you are looking for. (Assuming that you know the field you are looking for)

You could improve upon the RegEx I did, or change the method as you wish to make it better. But I think the general concept addresses what you need to achieve. One problem I can see with my solution right off the bat is the fact that you can search for anything really, not just names of the selected fields – which begs the question, why use RegEx, and not Like statements? But again, as I mentioned, you can improve upon the RegEx and address specific requirements you may have. Using Like might limit you in that direction. (In other words, only you know what’s good for you. I can’t say that from here.)

You can play with the query online here: db-fiddle query and use https://regex101.com/ for testing your RegEx.

Disclaimer I’m not a PostgreSQL developer. There must be other, perhaps better ways of doing this. (:



Source: stackoverflow