My question is simple, currently, I have a database with an “User” table which contains important information about each users (email, username, password), an “Identity” table which contain the first name, last name, birthday… of an user, and a “Medias” table which currently contains fields like “facebook”, “googlePlus”, “twitter”, “youtube”… the address of all medias of the user actually.
But, my question is: maybe the database will be better designed if I reduced my table “Medias” with 2 fields : “address”, “type” (and “user_id”), and the type could be “twitter”, “facebook” …
What is the optimal method if I have hundreds of users? In terms of speed and RAM usage?
Advertisement
Answer
The classic narrow vs wide debate. Let’s take your current design. You have a wide table with user_id and four other columns for social media links. Perhaps something like this:
The Data
medias user_id int twitter varchar google_plus varchar
If the varchar columns are nullable, your storage is at it’s optimal. If you user doesn’t have a twitter account but has a google account only the google_plus column will have data. Others are null and varchar
null fields do not take up any storage.
Now let’s take a look at the narrow design
medias user_id media_type link
This has three columns but they are always filled. You plan to fill up media_type with ‘twitter’, ‘google’ etc. So that means you are using more storage than with the wide design. IF a user has two social media accounts the user_id is stored twice. You can reduce this a little bit by using constants.
twitter=1 google_plus=2 yahoo=3
Store these numbers in the media_type column. Then that field can be smallint
which takes up very little space. If you expect a large number of media accounts you can’t use constants like this but would need to create a separate table for them and enter only your ID in this one.
The Indexes
Have the wide design and Want to find out how many users have a google account or a twitter account? Now you need an index on both the twitter and the google_plus column and those indexes are going to be really big. What ever you saved by storing nulls will be very small compared to the sizes of the indexes. (can be overcome by indexing only part of the column)
Try something like this: find out how many users have at least three social media accounts. That’s a tough query with a wide table isn’t it? But a really easy one with a narrow table.
On the other hand the narrow table guess what only the media_type column needs to be indexed and that’s a really small index. If you do this kind of query, you definitely want a narrow table.
Other considerations
Suppose Yahoo goes out of business, you would want to drop that column in your wide table right? Ever tried dropping a column on a table with 1 million rows? You type the alter table command, go out to lunch and when you come back you will find that it’s still running and your website is not responsive.
Suppose another social media company starts up and over takes facebook. Try to add a column. The same result as with the above
Finally none of this really matters for just hundreds of rows but it’s always a good idea to get into the practice of using the right design.