Skip to content
Advertisement

T-SQL Convert Arithmetic Formula into its Components

Problem Statement: I have a Formula column that has the arithmetic operation in it. I want to extract the variable names from the formula, and delimit the variables with a comma and create a New Column “Formula Components”

The variable names follow the particular pattern - '%[^A-Za-z,_0-9 ]%' 

However, I also want to keep the “Square Brackets” if they are to appear in the formula.

To Illustrate,

Input Data:

ID    |    Formula
------|-------------------------------------------
1     |    ([x1] + [x2]) / 100
2     |    ([y1] - [x2]) * 100 
3     |    z1 - z3 
4     |    [z4] % z3 
5     |    ((x1 * 2) + ((y1 + 2)/[x1])*[z3])/100   

Desired Output

ID    |    Formula                                |   FormulaComponents
------|------------------------------------------ |-----------------
1     |   ([x1] + [x2]) / 100                     |  [x1],[x2]
2     |   ([y1] - [x2]) * 100                     |  [y1],[x2]
3     |   z1 - z3                                 |  [z1],[z3]
4     |   [z4] % z3                               |  [z4],[z3]
5     |   ((x1 * 2) + ((y1 + 2)/[x1])*[z3])/100   |  [x1],[y1],[z3]

As you can see above,

  • Row 1. The Formula column consists of two variable, so the Formula components are [x1],[x2]
  • Row 5. Note that x1 appears two times in the formula; Once as x1 and once as [x1]. In this case, I only want to keep only [x1] once. [x1] could appear N number of times in the Formula Column, but should appear only once in the FormulaComponents Column

P.S.: The Order of the variables appearing in the “FormulaComponents” column does not matter. So for example, in Row 5, the order can be [y1], [z3], [x1] OR [z3],[x1],[y1] and so on

To summarize: I want to write a SELECT statement in T-SQL that will create this new column.

Answer

You can split the string using string_split() and then carefully reaggregate the results:

select *
from t cross apply
     (select string_agg('[' + value + ']', ',') as components
      from (select distinct replace(replace(value, '[', ''), ']', '') as value
            from string_split(replace(replace(replace(replace(t.formula, '(', ' '), ')', ' '), '*', ' '), '/', ' '), ' ') s
            where value like '[[a-z]%'
           ) s
     ) s;

Here is a db<>fiddle.

This is made harder than necessary because your formulas do not have a canonical format. It would be simpler if all variables were surrounded by square braces. Or if all operators were surrounded by spaces.

EDIT:

SQL Server 2016 has string_split() but not string_agg(). That fan be replaced with XML “stuff”:

You can split the string using string_split() and then carefully reaggregate the results:

select *
from t cross apply
     (select stuff( (select distinct ',[' + value + ']'
                     from (select distinct replace(replace(value, '[', ''), ']', '') as value
                           from string_split(replace(replace(replace(replace(t.formula, '(', ' '), ')', ' '), '*', ' '), '/', ' '), ' ') s
                           where value like '[[a-z]%'
                          ) t
                     order by 1
                     for xml path ('')
                    ), 1, 1, '') as components
     ) s;