I have an SQL query that reads
SELECT DISTINCT [NR] AS K_ID FROM [DB].[staging].[TABLE] WHERE [N]=1 and [O]='XXX' and [TYPE] in ('1_P', '2_I')
Since I’m saving the result in a CSV file (via Python Pandas) which is under version control I’ve noticed that the order of the result changes every time I run the query. In order to eliminate the Python part here I ran the query in MS SQL Server Management Studio, where I’m also observing a different order with every attempt.
It doesn’t matter in my case, but: Is it correct, that the result of the query can be ordered differently with every execution? And if so, is there a way to make the order “deterministic”?
Advertisement
Answer
SQL queries return results as an unordered set, unless the outermost query has an order by
.
On smaller amounts of data, the results look repeatable. However, on larger systems — and particularly on parallel systems — the ordering may be based on hashing algorithms, when nodes complete, and congestion on the network (among other factors). So, you can in fact see different orderings each time you run.