Problem Statement: I have a Formula column that has the arithmetic operation in it. I want to extract the variable names from the formula, and delimit the variables with a comma and create a New Column “Formula Components”
The variable names follow the particular pattern - '%[^A-Za-z,_0-9 ]%'
However, I also want to keep the “Square Brackets” if they are to appear in the formula.
To Illustrate,
Input Data:
ID | Formula
------|-------------------------------------------
1 | ([x1] + [x2]) / 100
2 | ([y1] - [x2]) * 100
3 | z1 - z3
4 | [z4] % z3
5 | ((x1 * 2) + ((y1 + 2)/[x1])*[z3])/100
Desired Output
ID | Formula | FormulaComponents
------|------------------------------------------ |-----------------
1 | ([x1] + [x2]) / 100 | [x1],[x2]
2 | ([y1] - [x2]) * 100 | [y1],[x2]
3 | z1 - z3 | [z1],[z3]
4 | [z4] % z3 | [z4],[z3]
5 | ((x1 * 2) + ((y1 + 2)/[x1])*[z3])/100 | [x1],[y1],[z3]
As you can see above,
- Row 1. The Formula column consists of two variable, so the Formula components are [x1],[x2]
- Row 5. Note that x1 appears two times in the formula; Once as x1 and once as [x1]. In this case, I only want to keep only [x1] once. [x1] could appear N number of times in the Formula Column, but should appear only once in the FormulaComponents Column
P.S.: The Order of the variables appearing in the “FormulaComponents” column does not matter. So for example, in Row 5, the order can be [y1], [z3], [x1] OR [z3],[x1],[y1] and so on
To summarize: I want to write a SELECT statement in T-SQL that will create this new column.
Advertisement
Answer
You can split the string using string_split()
and then carefully reaggregate the results:
select *
from t cross apply
(select string_agg('[' + value + ']', ',') as components
from (select distinct replace(replace(value, '[', ''), ']', '') as value
from string_split(replace(replace(replace(replace(t.formula, '(', ' '), ')', ' '), '*', ' '), '/', ' '), ' ') s
where value like '[[a-z]%'
) s
) s;
Here is a db<>fiddle.
This is made harder than necessary because your formulas do not have a canonical format. It would be simpler if all variables were surrounded by square braces. Or if all operators were surrounded by spaces.
EDIT:
SQL Server 2016 has string_split()
but not string_agg()
. That fan be replaced with XML “stuff”:
You can split the string using string_split()
and then carefully reaggregate the results:
select *
from t cross apply
(select stuff( (select distinct ',[' + value + ']'
from (select distinct replace(replace(value, '[', ''), ']', '') as value
from string_split(replace(replace(replace(replace(t.formula, '(', ' '), ')', ' '), '*', ' '), '/', ' '), ' ') s
where value like '[[a-z]%'
) t
order by 1
for xml path ('')
), 1, 1, '') as components
) s;