Skip to content
Advertisement

What is the optimal relational database design for storing an unknown number of similar but unique entities

The database we are designing allows users to authenticate with multiple 3rd party services, mostly social media (twitter, facebook, etc). There will be an unknown and growing number of these services. Each service requires a unique set of data for authentication that is not standard with the other services.

One user may authenticate many services, but they may only authenticate with one of each type of service.

Possible Solutions:

A) The most direct solution to this issue is to simply add a column for each service to the user table which contains the JSON authentication for that service. However, this violates normalization by leaving a large number of nulls in the database. What happens when there are 50 of these integrations for instance?

B) Each service gets its own table in the database. JSON is no longer needed as each field can be properly described. Then a lookup table is needed “user_has_service” for each service. This is a table which contains only two foreign keys, one for the user and one for the service, linking them together. This option seems the most correct but is very inefficient and will take many operations to determine what services a user has, increasing with the number of services. I believe also in this case, the ID field for the lookup table would need to be some kind of hash of the user and service together so that duplicate inserts are not possible.

Not at all a database expert and I have been grappling with this one for quite a while. Any thoughts?

Advertisement

Answer

A) The most direct solution … JSON

You are right, option A is grossly incorrect. It breaks Codds’ First Normal Form, thus it is not Relational. NULL in the database is an indication of incomplete Normalisation, which leads to complex SQL code. To be avoided at all costs.

similar but unique

To be clear, that they are unique to the Service is true. That {LoginName; UserName; Email; UserId; etc} are all similar is true in the implementation sense only, not in the data.

I may need to sketch this out.

That is a great idea. A visual data model is far more effective, because (a) the mind can comprehend it much better than text, and (b) therefore work out details; contradictions; missing bits; etc. Much easier to progress each iteration visually, than with text.

Second, we have had visual modelling tools since 1987 (1984 for a closed group), which have been made a Standard in 1993. Hopefully you appreciate that a standard-compliant model is better than a home-grown or corporate-supplied one. It displays all technical details rather than a small subset.

Is there a name for this strategy

It is plain old Relational Data Modelling, which includes Normalisation (ensuring compliance with Codd’s Normal Forms, as opposed to the insanity of implementing the NFs is fragmented progressive steps).

Obstacle

One problem that needs to be understood and eliminated is this. The “theoreticians” market and propagate 1960’s Record Filing Systems under the banner of “relational”. That is characterised by a Record IDs in every file. That method ensures the database remains physical, not logical, the very thing that Codd overcame with his Relational Model: a database that is logical and therefore extremely easy to navigate, by any querying party, current; planned; or unplanned.

The essential difference between 1960’s RFS and post-1970 Relational Databases is:

  • whereas the RFS maintains references between Files by physical pointer (Record ID), the Relational Database maintains references between Tables by logical Key.
  • A logical Key is “made up from the data” as per Codd
    • (A datum that is fabricated by the system is not “made up from the data”)
    • (Use of the SQL command PRIMARY KEY does not magically anoint the datum with the properties and qualities of a Relational Key: if you use PRIMARY KEY RecordID you are in 1960’s physical paradigm, not the post-1970 Relational paradigm)
  • Logical Keys provide Relational Integrity (as distinct from Referential Integrity, which is an ordinary function of SQL), which is far superior to that obtained by 1960’s RFS
  • As well as far superior Speed and Power (far less JOINs, and smaller sets)

Relational Database

Therefore I will give you the answer as a Relational Data Model, as per Codd.

Ampix0TA

  • Just one example of Relational Integrity:

    • the ServiceProperty FK elements in UserServiceProperty is constrained to PK (particular combination) in ServiceProperty
    • a UserServiceProperty row with Facebook.Email is prevented
    • A Record ID based 1960’s RFS that the “theoreticians” promote as “relational” cannot do that, various errors such as that one are allowed.
  • All my data models are rendered in IDEF1X, the Standard for modelling Relational databases since 1993

  • My IDEF1X Introduction is essential reading for beginners.

  • The IDEF1X Anatomy is a refresher for those who have lapsed.

  • If you have trouble reading the Predicates directly from the Data Model, let me know and I will produce them in text form.

Please feel free to ask questions, the more specific the better.

User contributions licensed under: CC BY-SA
1 People found this is helpful
Advertisement