When I am creating a new database table, what factors should I take into account for selecting the primary key’s data type?
Advertisement
Answer
Sorry to do that, but I found that the answers I gave to related questions (you can check this and this) could apply to this one. I reshaped them a little bit…
You will find many posts dealing with this issue, and each choice you’ll make has its pros and cons. Arguments for these usually refer to relational database theory and database performance.
On this subject, my point is very simple: surrogate primary keys ALWAYS work, while Natural keys MIGHT NOT ALWAYS work one of these days, and this for multiple reasons: field too short, rules change, etc.
To this point, you’ve guessed here that I am basically a member of the uniqueIdentifier/surrogate primary key team, and even if I appreciate and understand arguments such as the ones presented here, I am still looking for the case where “natural” key is better than surrogate …
In addition to this, one of the most important but always forgotten arguments in favor of this basic rule is related to code normalization and productivity:
each time I create a table, shall I lose time
- identifying its primary key and its physical characteristics (type, size)
- remembering these characteristics each time I want to refer to it in my code?
- explaining my PK choice to other developers in the team?
My answer is no to all of these questions:
- I have no time to lose trying to identify “the best Natural Primary Key” when the surrogate option gives me a bullet-proof solution.
- I do not want to remember that the Primary Key of my Table_whatever is a 10 characters long string when I write the code.
- I don’t want to lose my time negotiating the Natural Key length: “well if You need 10 why don’t you take 12 to be on the safe side?”. This “on the safe side” argument really annoys me: If you want to stay on the safe side, it means that you are really not far from the unsafe side! Choose surrogate: it’s bullet-proof!
So I’ve been working for the last five years with a very basic rule: each table (let’s call it ‘myTable’) has its first field called 'id_MyTable'
which is of uniqueIdentifier type. Even if this table supports a “many-to-many” relation, where a field combination offers a very acceptable Primary Key, I prefer to create this 'id_myManyToManyTable'
field being a uniqueIdentifier, just to stick to the rule, and because, finally, it does not hurt.
The major advantage is that you don’t have to care anymore about the use of Primary Key and/or Foreign Key within your code. Once you have the table name, you know the PK name and type. Once you know which links are implemented in your data model, you’ll know the name of available foreign keys in the table.
And if you still want to have your “Natural Key” somewhere in your table, I advise you to build it following a standard model such as
Tbl_whatever id_whatever, unique identifier, primary key code_whatever, whateverTypeYouWant(whateverLengthYouEstimateTheRightOne), indexed .....
Where id_ is the prefix for primary key, and code_ is used for “natural” indexed field. Some would argue that the code_ field should be set as unique. This is true, and it can be easily managed either through DDL or external code. Note that many “natural” keys are calculated (invoice numbers), so they are already generated through code
I am not sure that my rule is the best one. But it is a very efficient one! If everyone was applying it, we would for example avoid time lost answering to this kind of question!