Skip to content
Advertisement

Django complex partitioning & ordering for tree Table

I’m using Django with PostgreSQL as my Backend-Stack for an application of which the main feature is a complex, multilevel table displaying annotated time-series-data for different products.

So basically I’ve got 2 models defined, one is Product and the other one Timestamp, while the Product Model is hierarchial and uses the MPTT library for implementing the tree-structure. Each product can have multiple children, with 4 levels of depth as maximum and has multiple Timestamps, describing data for how they performed on a certain date. Thus I’m basically annotating all time-series data for each product in a queryset and that works flawlessly. The Problem which I’m facing is that I need to order the products dynamically by their annotated values, without destroying their position in the hierarchy and/or parent-child relationships. When I use some basic order_by(“tree_id”, “level” …) like method, the parent-child relationships get ‘overwritten’. Also Important is that every Product-root has it’s own tree_id and is a separate tree.

models.py

class Product(MPTTModel):
    owner = models.ForeignKey(Profile, on_delete=models.CASCADE, null=True)
    budget = models.FloatField(null=True)
    creation_date = models.CharField(max_length=35, null=True)
    product_type = models.CharField(max_length=35, null=True)
    last_updated = models.DateTimeField(null=True)
    name = models.CharField(max_length=35, null=True)
    parent = parent = TreeForeignKey('self', on_delete=models.CASCADE,
                            null=True, blank=True)

class Timestamp(models.Model):
    product = models.ForeignKey(Product, on_delete=models.CASCADE, null=True)
    viewed = models.IntegerField(default=0)
    bought = models.IntegerField(default=0)
    shown = models.IntegerField(default=0)
    date = models.DateField(null=True)

A very basic mockup of how the table should look can be seen below.enter image description here

My current function for the queryset ordering does not work correctly and can be seen here

  filters = self.request.GET
  d_start, d_end = filters["date_start"], filters["date_end"]
  

  qs=AmazonAd._tree_manager.get_queryset_descendants(Product.objects.filter(
     Exists(Timestamp.objects.filter(
         date__gte=d_start,
         date__lte=d_end,
         product=OuterRef("pk"))
         ), level=0), include_self=True).annotate(
            viewed=(Sum("timestamp__viewed")),
            bought=(Sum("timestamp__bought")),
            shown=(Sum("timestamp__shown"))).order_by("tree_id", "level", "bought")

The Result of this looks like this: enter image description here

I suspect that the solution will require some Raw SQL, maybe Partitioning combined with ordering.

I’m looking forward for your answers and thanks in advance.

Advertisement

Answer

postgreSQL’S Recursive CTEs turned out to be what I needed.

Django’s ORM is very reliable and robust, but it has it’s limits. Complex, architecture dependant features like Recursive Queries can’t be implemented using the ORM – at least not at the time I’m writing this answer.

Thus this was a scenario which required me to write raw SQL instead. I can’t include the SQL code here as it contains sensible backend information & logic.

But I can tell you a few things if you should ever have to deal with hierarchial models / queries in Django.

  • DO NOT use libraries like Mptt or TreeBeard if performance is a major priority. They are both pretty slow and in my scenario not qualified for production usage as UX would suffer tremendously due to longer loading times.

  • DO NOT hesitate to use Raw-SQL Queries just because Django provides an ORM. Sooner or later you’ll run into a problem which the ORM is not able to solve because of it’s high abstraction level.

  • DO NOT waste your precious time trying to reinvent the wheel.

  • If you have to work with a Relational Database to solve this issue for whatever reason, then use PostgreSQL as it offers the best performance, features and extensions for solving this issue (I personally think PostgreSQL is by far the best SQL-DB out there atm.)

  • Use the ltree extension for your queries. It’s very intuitive and easy to understand, you can read it up in the thorough postgres-docs.

  • If you have to order siblings without destroying the hierarchial structure, use an Array combined with ROW OVER.

Overall-Performance was around 10x faster in my scenario when I migrated from Mptt to CTEs.

Useful links:

User contributions licensed under: CC BY-SA
9 People found this is helpful
Advertisement