I’m new to data warehouse, So I try to follow the best practice, mimicking some implementation details from the Microsoft Demo DB WideWorldImportersDW, One of the things that I have noticed is using Sequence
as default value for PK over Identity
.
Could I ask, If it’s preferable to use Sequence
over Identity
in data warehouse in general and Which one is more convenient especially during ETL process?.
Advertisement
Answer
A sequence has more guarantees than an identity column. In particular, each call to a sequence is guaranteed to produce the next value for the sequence.
However, an identity
column can have gaps and other inconsistencies. This is all documented here.
Because of the additional guarantees on sequences, I suspect that they are slower. In particular, I suspect that the database cannot preallocate values in batch. That means that in a multi-threaded environments, sequences would impose serialization on transactions, slowing things down.
In general, I see identity
used for identifying columns in tables. And although there is probably a performance comparison, I haven’t seen one. But I suspect that sequences are a wee bit slower in some circumstances.