Skip to content
Advertisement

How to store number in shortest possible size?

I will be adding comments into my website and I want to have a tree structure so each comment can have a parent. This creates a problem when retrieving comments because each comment would have to be traversed for children and this is unacceptable when it comes to database storage.

There are various solutions to this well known performance problem and I want to use simple, and fast, path prefix scan(SELECT * FROM comments WHERE path LIKE 'commentID/%') where I will have index with path to each comment via all its parents and the comment’s id as value.

In practice it would look like:

[comment id]/[comment id]/[comment id]/.. = [comment id]

This way I can find all comments for a certain path at once and also delete children at once when needed.

The comment id will be unsigned integer, asigned by the databse as auto-incrementing key.

The problem I am looking at is how to have the shortest possible representation of the comment id so the path index will not be too long.

I expect to use unsigned 64bit long integer which takes up 8 bytes in binary format. In case of three predecessors, the path value wil be 24 bytes + 2 bytes for delimiter. If I would use a string representation, then it depends on the size of the numbers. In case each parent’s id is 5 digits long(12345), the path value would be 17 bytes long and it would be more efficient than binary representation of the numbers by 9 bytes. But once the ids go above 8 digits, the string variant becomes less efficient, it is longer, than the binary format.

So my question is if there is another way to have short representation of an integer? I was thinking in line of bijective function.

Advertisement

Answer

One obvious answer is to use a 32-bit integer instead of a 64-bit integer. Do you really think there will be more the 232 comments on your website? No offense intended, but I assume you are not implementing Facebook. You’re not even implementing Stack Overflow, which only has 83 million comments so far.

If you really did need to support 264 distinct values, I don’t see how a bijective function would help. The values in both sets related in the bijection would need a minimum of 64 bits each, and you wouldn’t save anything.

Another answer is to change the way you store the hierarchical data, so you don’t have to worry about the length of the comment path. You’re using one way of managing tree-like data, which is sometimes called Materialized Path, and this naturally imposes a limit on the depth of the path, which is the point of your question.

A way to do this without Materialized Path is to just use the normal way of storing hierarchy: each comment has a reference to its own parent. Then you need to learn to use recursive SQL queries, which are supported in MySQL 8.0.

Or you could use other alternatives for storing the hierarchy. I like to use a Closure Table, see my answer to What is the most efficient/elegant way to parse a flat table into a tree?

Which design is best for your app depends on the kinds of queries you will need to make against these data. You’ll have to experiment with each solution and see how well it works for the uses you will need to do, such as adding a comment, searching for a thread of comments, deleting a comment or a thread, etc.

User contributions licensed under: CC BY-SA
7 People found this is helpful
Advertisement