Problem Statement: I have a Formula column that has the arithmetic operation in it. I want to extract the variable names from the formula, and delimit the variables with a comma and create a New Column “Formula Components”
The variable names follow the particular pattern - '%[^A-Za-z,_0-9 ]%'
However, I also want to keep the “Square Brackets” if they are to appear in the formula.
To Illustrate,
Input Data:
ID | Formula ------|------------------------------------------- 1 | ([x1] + [x2]) / 100 2 | ([y1] - [x2]) * 100 3 | z1 - z3 4 | [z4] % z3 5 | ((x1 * 2) + ((y1 + 2)/[x1])*[z3])/100
Desired Output
ID | Formula | FormulaComponents ------|------------------------------------------ |----------------- 1 | ([x1] + [x2]) / 100 | [x1],[x2] 2 | ([y1] - [x2]) * 100 | [y1],[x2] 3 | z1 - z3 | [z1],[z3] 4 | [z4] % z3 | [z4],[z3] 5 | ((x1 * 2) + ((y1 + 2)/[x1])*[z3])/100 | [x1],[y1],[z3]
As you can see above,
- Row 1. The Formula column consists of two variable, so the Formula components are [x1],[x2]
- Row 5. Note that x1 appears two times in the formula; Once as x1 and once as [x1]. In this case, I only want to keep only [x1] once. [x1] could appear N number of times in the Formula Column, but should appear only once in the FormulaComponents Column
P.S.: The Order of the variables appearing in the “FormulaComponents” column does not matter. So for example, in Row 5, the order can be [y1], [z3], [x1] OR [z3],[x1],[y1] and so on
To summarize: I want to write a SELECT statement in T-SQL that will create this new column.
Advertisement
Answer
You can split the string using string_split()
and then carefully reaggregate the results:
select * from t cross apply (select string_agg('[' + value + ']', ',') as components from (select distinct replace(replace(value, '[', ''), ']', '') as value from string_split(replace(replace(replace(replace(t.formula, '(', ' '), ')', ' '), '*', ' '), '/', ' '), ' ') s where value like '[[a-z]%' ) s ) s;
Here is a db<>fiddle.
This is made harder than necessary because your formulas do not have a canonical format. It would be simpler if all variables were surrounded by square braces. Or if all operators were surrounded by spaces.
EDIT:
SQL Server 2016 has string_split()
but not string_agg()
. That fan be replaced with XML “stuff”:
You can split the string using string_split()
and then carefully reaggregate the results:
select * from t cross apply (select stuff( (select distinct ',[' + value + ']' from (select distinct replace(replace(value, '[', ''), ']', '') as value from string_split(replace(replace(replace(replace(t.formula, '(', ' '), ')', ' '), '*', ' '), '/', ' '), ' ') s where value like '[[a-z]%' ) t order by 1 for xml path ('') ), 1, 1, '') as components ) s;