Say I have a system that catalogues books and I want to group them into genres and there is a UNF tuple:
BookId (PK): number BookTitle: string Blurb: string Genres: string[] (... everything else)
When normalised, Genre needs to be made atomic, and also needs to be put into it’s own relation in order to no cause duplication. My first instinct is to do the following:
BOOK BookId (PK): number BookTitle: string Blurb: string GENRE GenreId (PK): number GenreName: string BOOKGENRE BookId (CK): number GenreId (CK): number
But another idea occurred to me but something felt.. wrong(?) about it, and like it was bad practise. But my idea was to get rid of the GENRE table by simply making the GenreName field the primary key and only making use of BOOKGENRE. Since GenreName would still be indexed as it’s part of a composite key it would still be indexed and efficient to perform operations on, such as getting all books with Genre “X”.
BOOK BookId (PK): number BookTitle: string Blurb: string BOOKGENRE BookId (CK): number GenreName (CK): string
Is this acceptable practice or is it required that I keep genre as a separate table with a key?
Advertisement
Answer
I would recommend sticking to the original design with three separate tables, and an integer primary key. One reason is that you want keys that are meaningless, ie that have no business signification.
With the alternative design that you are thinking about, what if you want to rename a genre, say from Sci-Fi to Science Fiction? You would need to update every row in the bookgenre
table, which is unecessarily complex.
Primary keys should never be updated, so if there is a risk that you might need to ever change the value of a genre, you should go with surrogate keys: this will make your life easier on the long run.