I need to design a database schema. My data is as follows:
Product Id (Primary Key), Product Name (String), Product Type (String), Item Count (INT), Tags (Array of Strings)
My requirements:
- I need to frequently query Products based on Tags. It is a many-to-many relationship – Each Product Id could have many Tags and each tag could have many Product Ids.
- Queries would be like:
Select p.product_name from PRODUCTS as p JOIN PRODUCT_TAG as pt on p.id = pt.product_id JOIN TAGS as t on t.id = pt.tag_id where t.tag = 'tag_name'; Select item_count from PRODUCTS where product_id = id; Update PRODUCTS set item_count = count where product_id = id;
This is the schema I’m thinking to have:
PRODUCTS +----+--------------+-------+------------+ | id | product_name | type | item_count | +----+--------------+-------+------------+ | 1 | abc | type1 | 1 | | 2 | def | type2 | 1 | +----+--------------+-------+------------+ TAGS +----+------+ | id | tag | +----+------+ | 1 | tag1 | | 2 | tag2 | +----+------+ PRODUCT_TAG +------------+--------+ | product_id | tag_id | +------------+--------+ | 1 | 1 | | 1 | 2 | | 2 | 1 | | 2 | 2 | +------------+--------+
Now, I know that this schema is not the best way to achieve what I require. Also it is very space-consuming, as PRODUCT_TAG table would be huge. Moreover, taking a join with such huge table won’t be very efficient provided PRODUCTS table will be almost ~ 10% of PRODUCT_TAG table in number of rows. I am not very familiar with MySQL database. So, what would be the best way to do it?
Any help is appreciated. Thanks in advance!
Advertisement
Answer
There are four ways you can store multiple tags in a row:
- In multiple columns with a separate column per tag.
- In a delimited list.
- A set.
- As JSON.
The first two are just bad ideas. They do not allow:
- Validating that tags are not duplicated.
- Checking if two entities have the same tags.
Each also has its own limitations. For instance:
- Separate columns limit the number of tags.
- A comma delimited list does not allow foreign key relationships to validate tags.
A set limits the tags to 64 values. They also can be rather confusing because they are not often used in MySQL — and are not supported in other databases. The values are really bits, so there is a trick to deal with them as strings. I view them as a hack, but accept that others find them useful in some cases.
JSON does have some of these shortcomings as well, but it is more acceptable for multiple values.
However, a separate junction table is the tried-and-true method to do what you want. It makes it easy to:
- Add and remove tags.
- Validate tags that are assigned.
- Prevent duplicate tags on a product.
- Find the tags on a particular product.
- Find the products on a particular tag.
- And so on.
If performance is an issue, then indexes usually solve the problem.
There might be some cases where an alternative would be appropriate, but those are very rare cases. The junction table solution is much preferred.