What should I consider when selecting a data type for my primary key?

Question

When I am creating a new database table, what factors should I take into account for selecting the primary key's data type?

Accepted Answer

Sorry to do that, but I found that the answers I gave to related questions (you can check this and this) could apply to this one. I reshaped them a little bit&#8230;You will find many posts dealing with this issue, and each choice you&#8217;ll make has its pros and cons. Arguments for these usually refer to relational database theory and database performance.On this subject, my point is very simple: surrogate primary keys ALWAYS work, while Natural keys MIGHT NOT ALWAYS work one of these days, and this for multiple reasons: field too short, rules change, etc.To this point, you&#8217;ve guessed here that I am basically a member of the uniqueIdentifier/surrogate primary key team, and even if I appreciate and understand arguments such as the ones presented here, I am still looking for the case where &#8220;natural&#8221; key is better than surrogate &#8230; In addition to this, one of the most important but always forgotten arguments in favor of this basic rule is related to code normalization and productivity:each time I create a table, shall I lose timeidentifying its primary key and its physical characteristics (type, size)remembering these characteristics each time I want to refer to it in my code?explaining my PK choice to other developers in the team?My answer is no to all of these questions:I have no time to lose trying to identify &#8220;the best Natural Primary Key&#8221; when the surrogate option gives me a bullet-proof solution.I do not want to remember that the Primary Key of my Table_whatever is a 10 characters long string when I write the code.I don&#8217;t want to lose my time negotiating the Natural Key length: &#8220;well if You need 10 why don&#8217;t you take 12 to be on the safe side?&#8221;. This &#8220;on the safe side&#8221; argument really annoys me: If you want to stay on the safe side, it means that you are really not far from the unsafe side! Choose surrogate: it&#8217;s bullet-proof!So I&#8217;ve been working for the last five years with a very basic rule: each table (let&#8217;s call it &#8216;myTable&#8217;) has its first field called 'id_MyTable' which is of uniqueIdentifier type. Even if this table supports a &#8220;many-to-many&#8221; relation, where a field combination offers a very acceptable Primary Key, I prefer to create this 'id_myManyToManyTable' field being a uniqueIdentifier, just to stick to the rule, and because, finally, it does not hurt.The major advantage is that you don&#8217;t have to care anymore about the use of Primary Key and/or Foreign Key within your code. Once you have the table name, you know the PK name and type. Once you know which links are implemented in your data model, you&#8217;ll know the name of available foreign keys in the table.And if you still want to have your &#8220;Natural Key&#8221; somewhere in your table, I advise you to build it following a standard model such as Tbl_whatever   id_whatever, unique identifier, primary key   code_whatever, whateverTypeYouWant(whateverLengthYouEstimateTheRightOne), indexed   .....Where id_ is the prefix for primary key, and code_ is used for &#8220;natural&#8221; indexed field. Some would argue that the code_ field should be set as unique. This is true, and it can be easily managed either through DDL or external code. Note that many &#8220;natural&#8221; keys are calculated (invoice numbers), so they are already generated through codeI am not sure that my rule is the best one. But it is a very efficient one! If everyone was applying it, we would for example avoid time lost answering to this kind of question!

Advertisement

Answer