Skip to content
Advertisement

Database architecture: When records having different number of attributes (columns)

Suppose I have these records

Total estimated number of records: ~180,000

Total number of attributes: 70, increasing

Example of queries I’m going to do:

  1. SELECT * from table WHERE (has attribute B)
  2. SELECT * from table WHERE (has attributes B & D)
  3. SELECT * from table WHERE (has 2 attributes)
  4. SELECT * from table WHERE (has >=3 attributes)
  5. SELECT count(*) from table WHERE (has attribute B)

What is the best database architecture?


Design 1: Storing attributes as 1s & 0s

Problems:

  1. New columns needed to be added periodically when new attribute appears
  2. Much redundant data (0s), as more than 80% of data has only 1 attribute, and less than 0.01% of records will have more than 8 attributes.

Design 2: Store attributes as a CSV string

Problems:

  1. Slow query when I do

Design 3: Each attribute has its own table storing record IDs

Problems

  1. Many tables
  2. New tables needed to be added periodically
  3. How to do SELECT * from table WHERE id (appears in exactly 3 tables)?

These are the designs I can think of, please propose any good architecture.

Advertisement

Answer

Actually, none of your designs are optimal (the third is the best), and I recommend a single junction table which relates ID valued to their attributes, e.g.

This is the most normalized approach. To see why this design is optimal, see how easy it is to find all IDs which have attribute B:

It is also fairly straightforward to find all IDs having both attributes B and D:

Your first two suggestions would make it much harder to write these queries (give it a try), and in general it is bad practice to store CSV in database tables. Your third suggestion does store the relationships correctly, but it unnecessarily spreads out data across multiple tables.

A more general form of the above query which can easily be extended to any number of IDs is:

User contributions licensed under: CC BY-SA
3 People found this is helpful
Advertisement