# T-SQL Convert Arithmetic Formula into its Components

Problem Statement: I have a Formula column that has the arithmetic operation in it. I want to extract the variable names from the formula, and delimit the variables with a comma and create a New Column “Formula Components”

```The variable names follow the particular pattern - '%[^A-Za-z,_0-9 ]%'
```

However, I also want to keep the “Square Brackets” if they are to appear in the formula.

To Illustrate,

Input Data:

```ID    |    Formula
------|-------------------------------------------
1     |    ([x1] + [x2]) / 100
2     |    ([y1] - [x2]) * 100
3     |    z1 - z3
4     |    [z4] % z3
5     |    ((x1 * 2) + ((y1 + 2)/[x1])*[z3])/100

```

Desired Output

```ID    |    Formula                                |   FormulaComponents
------|------------------------------------------ |-----------------
1     |   ([x1] + [x2]) / 100                     |  [x1],[x2]
2     |   ([y1] - [x2]) * 100                     |  [y1],[x2]
3     |   z1 - z3                                 |  [z1],[z3]
4     |   [z4] % z3                               |  [z4],[z3]
5     |   ((x1 * 2) + ((y1 + 2)/[x1])*[z3])/100   |  [x1],[y1],[z3]
```

As you can see above,

• Row 1. The Formula column consists of two variable, so the Formula components are [x1],[x2]
• Row 5. Note that x1 appears two times in the formula; Once as x1 and once as [x1]. In this case, I only want to keep only [x1] once. [x1] could appear N number of times in the Formula Column, but should appear only once in the FormulaComponents Column

P.S.: The Order of the variables appearing in the “FormulaComponents” column does not matter. So for example, in Row 5, the order can be [y1], [z3], [x1] OR [z3],[x1],[y1] and so on

To summarize: I want to write a SELECT statement in T-SQL that will create this new column.

You can split the string using `string_split()` and then carefully reaggregate the results:

```select *
from t cross apply
(select string_agg('[' + value + ']', ',') as components
from (select distinct replace(replace(value, '[', ''), ']', '') as value
from string_split(replace(replace(replace(replace(t.formula, '(', ' '), ')', ' '), '*', ' '), '/', ' '), ' ') s
where value like '[[a-z]%'
) s
) s;
```

Here is a db<>fiddle.

This is made harder than necessary because your formulas do not have a canonical format. It would be simpler if all variables were surrounded by square braces. Or if all operators were surrounded by spaces.

EDIT:

SQL Server 2016 has `string_split()` but not `string_agg()`. That fan be replaced with XML “stuff”:

You can split the string using `string_split()` and then carefully reaggregate the results:

```select *
from t cross apply
(select stuff( (select distinct ',[' + value + ']'
from (select distinct replace(replace(value, '[', ''), ']', '') as value
from string_split(replace(replace(replace(replace(t.formula, '(', ' '), ')', ' '), '*', ' '), '/', ' '), ' ') s
where value like '[[a-z]%'
) t
order by 1
for xml path ('')
), 1, 1, '') as components
) s;
```