Skip to content
Advertisement

Order by column having duplicates for aggregation

Employees Table

select name, gender, salary, sum(salary) over(order by salary) 
from Employees

Question: Why does ordering by a column having duplicates produce final values instead of intermediate values? For e.g. when this query is executed, 3 employees having salary = 5000, the final value i.e. the value that should be produced for the 3rd employee is produced for the 1st?

Advertisement

Answer

SQL window functions have a window frame clause that specifies what rows are included in the window function. By default, the default uses range between unbounded preceding and current row, so your code is really:

sum(salary) over (order by salary range between unbounded preceding and current row)

The range partitioning treats ties equally, so all are included in the frame — or none are included.

The alternative is row partitioning which treats each row individually:

sum(salary) over (order by salary row between unbounded preceding and current row)

The SQL standard specifies that range is the default (I’m guessing so the results are stable, see below). So, this is the default in all databases.

When the salaries are distinct, these return the same values. However, they are not always the same as this fiddle shows.

Note that when there are ties and the window frame uses rows then the results are unstable. That means that on different runs you can get different results on a given row (and I can throw in from personal experience that this is very, very hard to debug).

You can resolve this problem by including a unique key in the the order by, which is the solution suggested by GMB.

User contributions licensed under: CC BY-SA
8 People found this is helpful
Advertisement