I have the following table:
CREATE TABLE dbo.TestSort ( Id int NOT NULL IDENTITY (1, 1), Value int NOT NULL )
The Value
column could (and is expected to) contain duplicates.
Let’s also assume there are already 1000 rows in the table.
I am trying to prove a point about unstable sorting.
Given this query that returns a ‘page’ of 10 results from the first 1000 inserted results:
SELECT TOP 10 * FROM TestSort WHERE Id <= 1000 ORDER BY Value
My intuition tells me that two runs of this query could return different rows if the Value
column contains repeated values.
I’m basing this on the facts that:
- the sort is not stable
- if new rows are inserted in the table between the two runs of the query, it could possibly create a re-balancing of B-trees (the
Value
column may be indexed or not)
EDIT: For completeness: I assume rows never change once inserted, and are never deleted.
In contrast, a query with stable sort (ordering also by Id) should always return the same results, since ID
s are unique:
SELECT TOP 10 * FROM TestSort WHERE Id <= 1000 ORDER BY Value, Id
The question is: Is my intuition correct? If yes, can you provide an actual example of operations that would produce different results (at least “on your machine”)? You could modify the query, add indexes on the Values
column etc.
I don’t care about the exact query, but about the principle.
I am using MS SQL Server (2014), but am equally satisfied with answers for any SQL database.
If not, then why?
Advertisement
Answer
If you have ties when ordering the order by is not stable.
CREATE TABLE #TestSort ( Id INT NOT NULL IDENTITY (1, 1) PRIMARY KEY, Value INT NOT NULL ) ; DECLARE @c INT = 0; WHILE @c < 100000 BEGIN INSERT INTO #TestSort(Value) VALUES ('2'); SET @c += 1; END
Example:
SELECT TOP 10 * FROM #TestSort ORDER BY Value OPTION (MAXDOP 4); DBCC DROPCLEANBUFFERS; -- run to clear cache SELECT TOP 10 * FROM #TestSort ORDER BY Value OPTION (MAXDOP 4);
The point is I force query optimizer to use parallel plan so there is no guaranteed that it will read data sequentially like Clustered index probably will do when no parallelism is involved.
You cannot be sure how Query Optimizer will read data unless you explicitly force to sort result in specific way using ORDER BY Id, Value
.
For more info read No Seatbelt - Expecting Order without ORDER BY
.