Skip to content
Advertisement

Group objects by a sum of their references with a max limit

I am looking to aggregate IDs from a table by chunks from their reference from an other table. It is a bit hard to summary my problem so I will give an example:

I have two tables, the table Box and the table Item.

There is a many to one reference between them, a box can contains many items, and an item cannot exist without a box.

Currently there are a lot of boxes (> 100,000) and items (> 600,000), and even if most of boxes have around 10 items, a significant amount have more than 1,000 items.

I need to do a specific process on the items, where I have to compare an item to all other items from the same box (with a Java code). In order to avoid selecting a lot of items as once, I want to try to regroup all boxes IDs in a single cell (separated with a coma) that satisfies a certain chunk size, this chunk equate to the max amount of items for this group of boxes.

The only thing that I managed to do is a request counting the amount of items by boxes:

The result I want would look like this, if I set a chunk of items to 15000 for example. id_group would be a text column.

At the beginning there will not be a lot of IDs, but with less items in the latter boxes, there will be more and more IDs in each cell to reach the chunk limit, and it is what I want! If for any reason there is a box containing more items than the chunk limit, then it would just return this single ID in the cell. I don’t need the total_amount though, I just need the IDs of the boxes joined by a comma, then I will be able to do my processes.

Is there a way with postgreSQL to do this?

Advertisement

Answer

You can implement a greedy algorithm for combining the boxes using a recursive CTE:

Here is a db<>fiddle.

User contributions licensed under: CC BY-SA
2 People found this is helpful
Advertisement