select name, gender, salary, sum(salary) over(order by salary) from Employees
Question: Why does ordering by a column having duplicates produce final values instead of intermediate values? For e.g. when this query is executed, 3 employees having salary = 5000, the final value i.e. the value that should be produced for the 3rd employee is produced for the 1st?
Advertisement
Answer
SQL window functions have a window frame clause that specifies what rows are included in the window function. By default, the default uses range between unbounded preceding and current row
, so your code is really:
sum(salary) over (order by salary range between unbounded preceding and current row)
The range
partitioning treats ties equally, so all are included in the frame — or none are included.
The alternative is row
partitioning which treats each row individually:
sum(salary) over (order by salary row between unbounded preceding and current row)
The SQL standard specifies that range
is the default (I’m guessing so the results are stable, see below). So, this is the default in all databases.
When the salaries are distinct, these return the same values. However, they are not always the same as this fiddle shows.
Note that when there are ties and the window frame uses rows
then the results are unstable. That means that on different runs you can get different results on a given row (and I can throw in from personal experience that this is very, very hard to debug).
You can resolve this problem by including a unique key in the the order by
, which is the solution suggested by GMB.