Skip to content
Advertisement

Managing multiple categories trees, using Python and PostgreSQL

I have multiple categories, which can have None or one or multiple sub-categories.

The process theoretically can go to infinite. So, it is like having multiple trees.

Tree example.

I have also Item(s). An Item can be in multiple categories.

At this moment to connect the categories, in database I use three fields:

  • children (the children of a category),

  • path([1,4,8], basically are the ids of grandparent, parent, and category itself)

  • depth, represent the level in the tree for each category

Using this fields I avoid some recursive and using more queries.

I usually retrieve data like:

  • top categories (depth 0)

  • subcategories of a category

  • sibling categories

  • items in the category (for example a grandparent category, will show its direct items, the items of children, and the items of grandchildren)

At this moment I’m using Django(want to move to FastAPI) and PostgreSQL, and every time I have CRUD operations on categories, the three fields (path,depth,children)will be modified.

I’m thinking maybe is a better way, to maintain/retrieve the categories tree and corresponding items.

Advertisement

Answer

There are various possible strategies to store a tree in a database.

Storing full paths in an array as you currently is one of them. But with this solution is is hard to enforce referential integrity (how do you guarantee that these ids in the arrays really exist in the table?), and simple tree operations are tedious (how do you enumerate the direct children of given node?).

The answer by @VesaKarjalainen suggests using the adjacency list model, a separate table where each element refers to its immediate ancestor. It works, but has downsides: typically, is that complicated to traverse the hierarchy (like get all children or parents of a given node): you need some kind of iteration or recursion for this, which SQL engines do not do efficiently.

I would recommend the closure table approach. This works by creating a separate table that stores all possible paths in the tree, like so:

For this tree structure that you provided:

You would store the following data:

Now, say you want to retrieve all children of a given category, that’s as simple as:

To get all the parents, you would just replace where parent_id = ... with where child_id = ....

You can bring in the main table with a join:

User contributions licensed under: CC BY-SA
8 People found this is helpful
Advertisement