Skip to content
Advertisement

Using Identity or sequence in data warehouse

I’m new to data warehouse, So I try to follow the best practice, mimicking some implementation details from the Microsoft Demo DB WideWorldImportersDW, One of the things that I have noticed is using Sequence as default value for PK over Identity.


Could I ask, If it’s preferable to use Sequence over Identity in data warehouse in general and Which one is more convenient especially during ETL process?.

Advertisement

Answer

A sequence has more guarantees than an identity column. In particular, each call to a sequence is guaranteed to produce the next value for the sequence.

However, an identity column can have gaps and other inconsistencies. This is all documented here.

Because of the additional guarantees on sequences, I suspect that they are slower. In particular, I suspect that the database cannot preallocate values in batch. That means that in a multi-threaded environments, sequences would impose serialization on transactions, slowing things down.

In general, I see identity used for identifying columns in tables. And although there is probably a performance comparison, I haven’t seen one. But I suspect that sequences are a wee bit slower in some circumstances.

User contributions licensed under: CC BY-SA
5 People found this is helpful
Advertisement