I have multiple categories, which can have None or one or multiple sub-categories.
The process theoretically can go to infinite. So, it is like having multiple trees.
Tree example.
A - A1 - A11 - A12 -A2 B C - C1
I have also Item(s). An Item can be in multiple categories.
At this moment to connect the categories, in database I use three fields:
children (the children of a category),
path([1,4,8], basically are the ids of grandparent, parent, and category itself)
depth, represent the level in the tree for each category
Using this fields I avoid some recursive and using more queries.
I usually retrieve data like:
top categories (depth 0)
subcategories of a category
sibling categories
items in the category (for example a grandparent category, will show its direct items, the items of children, and the items of grandchildren)
At this moment I’m using Django(want to move to FastAPI) and PostgreSQL, and every time I have CRUD operations on categories, the three fields (path,depth,children)will be modified.
I’m thinking maybe is a better way, to maintain/retrieve the categories tree and corresponding items.
Advertisement
Answer
There are various possible strategies to store a tree in a database.
Storing full paths in an array as you currently is one of them. But with this solution is is hard to enforce referential integrity (how do you guarantee that these id
s in the arrays really exist in the table?), and simple tree operations are tedious (how do you enumerate the direct children of given node?).
The answer by @VesaKarjalainen suggests using the adjacency list model, a separate table where each element refers to its immediate ancestor. It works, but has downsides: typically, is that complicated to traverse the hierarchy (like get all children or parents of a given node): you need some kind of iteration or recursion for this, which SQL engines do not do efficiently.
I would recommend the closure table approach. This works by creating a separate table that stores all possible paths in the tree, like so:
create table category_path ( parent_id int, child_id int, level int, primary key(parent_id, child_id), foreign key(parent_id) references category(id), foreign key(parent_id) references category(id) );
For this tree structure that you provided:
A B C / | A1 A2 C1 / A11 A12
You would store the following data:
parent_id child_id level A A 0 A A1 1 A A2 1 A A11 2 A A12 2 A1 A11 1 A1 A12 1 B B 0 C C 0 C C1 1
Now, say you want to retrieve all children of a given category, that’s as simple as:
select * from category_path where parent_id = 'A'
To get all the parents, you would just replace where parent_id = ...
with where child_id = ...
.
You can bring in the main table with a join
:
select c.* from category_path cp inner join categories c on c.id = cp.parent_id where cp.parent_id = 'A'