I’m trying to formulate a filter in a table column. Table A has 3 columns spare_qty (plays the role of ID), alt (altitude, the Value to be filterd), Alt_derivative (the required Filtered_Value). The table contains about a million lines. My purpose is to populate column Alt_derivative at line X by the formula: Filtered_Value(X) = 0.9*Filtered_Value(X-1) + 0.1 * Value. The function in its essense smooths the data without lagging, as a window function would do. Turns out it’s not such a trivial task.
- Update table column does not allow usage of function lead or lag. Neither the self reference is successful. Sort order is ignored.
- Cursor implementation takes literly hours. It seems to perform NxN lines calculations. Also, even though the sort order should be taken into account, the update ignores the order. The script is:
DECLARE @spare_qty bigint =0, @alt float = 0, @alt_derivative float =0 DECLARE TRX_Line CURSOR FOR SELECT spare_qty1, alt, Alt_Derivative FROM [Incremental_Staging] order by ID asc; OPEN TRX_Line FETCH NEXT FROM TRX_Line INTO @ID, @value, @filtered_value WHILE @@FETCH_STATUS = 0 BEGIN update [Incremental_Staging] SET @alt = alt, @alt_derivative = 0.1*@alt+ 0.9* @alt_derivative where spare_qty1=@spare_qty1 FETCH NEXT FROM TRX_Line INTO @ID, @value, @filtered_value end close TRX_Line; Deallocate TRX_Line;
- CTE does not respect sort order.
Frankly, I have tried many approaches only to meet dead end. Is it possible such an easy, excel like, thing can’t be done? Am I missing something? Any help would be appreciated.
Advertisement
Answer
This is too long for a comment.
Filtered_Value(X) = 0.9*Filtered_Value(X-1) + 0.1 * Value
This isn’t just a lag. This is exponential smoothing. In SQL, there are two basic approaches to this.
- Arithmetic, which essentially implements a product aggregation function using mathematical functions.
- A recursive CTE.
Neither is going to work well for your use-case. The first is going to have a problem on 1,000,000 rows. There are errors that accumulate due to rounding of floating point (or fixed point) numbers. And the error is probably noticeable on 1,000,000 rows.
The second would work, but is probably too slow.
You can check on how long it take for this to return a value:
with i as ( select i.*, row_number() over (order by i) as seqnum from incremental_staging i ), cte as ( select seqnum, id, value, value as filter_value from i where seqnum = 1 union all select i.seqnum, i.id, i.value, cte.filter_value * 0.9 + 0.1 * i.value from cte join i on i.seqnum = cte.seqnum + 1 ) select * from cte option (maxrecursion 0);
The recursive CTE (in this case) is fine a few dozen or few hundred rows. It is probably tolerable on a few low thousands. But one a million rows, it might be pretty slow (although I imagine faster than a cursor). You can update using it, by using:
update incremental_staging set filter_value = cte.filter_value from cte where cte.id = incremental_staging.id;