My goal
To calculate the sum of components sold as part of their parent assemblies.
I’m sure this must be a common use case, but I haven’t yet found documentation that leads to the result I’m looking for.
Background
I’m running Postgres 11 on CentOS 7. I have some tables like as follows:
CREATE TABLE the_schema.names_categories ( id INTEGER NOT NULL PRIMARY KEY GENERATED ALWAYS AS IDENTITY, created_at TIMESTAMPTZ DEFAULT now(), thing_name TEXT NOT NULL, thing_category TEXT NOT NULL ); CREATE TABLE the_schema.relator ( id INTEGER NOT NULL PRIMARY KEY GENERATED ALWAYS AS IDENTITY, created_at TIMESTAMPTZ DEFAULT now(), parent_name TEXT NOT NULL, child_name TEXT NOT NULL, child_quantity INTEGER NOT NULL ); CREATE TABLE the_schema.sales ( id INTEGER NOT NULL PRIMARY KEY GENERATED ALWAYS AS IDENTITY, created_at TIMESTAMPTZ DEFAULT now(), sold_name TEXT NOT NULL, sold_quantity INTEGER NOT NULL );
And a view like so, which is mainly to associate the category key with relator.child_name for filtering:
CREATE VIEW the_schema.relationships_with_child_catetgory AS ( SELECT r.parent_name, r.child_name, r.child_quantity, n.thing_category AS child_category FROM the_schema.relator r INNER JOIN the_schema.names_categories n ON r.child_name = n.thing_name );
And these tables contain some data like this:
INSERT INTO the_schema.names_categories (thing_name, thing_category) VALUES ('parent1', 'bundle'), ('child1', 'assembly'), ('subChild1', 'component'), ('subChild2', 'component'); INSERT INTO the_schema.relator (parent_name, child_name, child_quantity) VALUES ('parent1', 'child1', 1),('child1', 'subChild1', 10), ('child1', 'subChild2', 2); INSERT INTO the_schema.sales (sold_name, sold_quantity) VALUES ('parent1', 1), ('parent1', 2);
I need to construct a query that, given these data, will return something like the following:
child_name | sum_sold ------------+---------- subChild1 | 30 subChild2 | 6 (2 rows)
The problem is that I haven’t the first idea how to go about this and in fact it’s getting scarier as I type. I’m having a really hard time visualizing the connections that need to be made, so it’s difficult to get started in a logical way. Usually, Molinaro’s SQL Cookbook has something to get started on, and it does have a section on hierarchical queries, but near as I can tell, none of them serve this particular purpose.
Based on my research on this site, it seems like I probably need to use a recursive CTE /Common Table Expression, as demonstrated in this question/answer, but I’m having considerable difficulty understanding this method and how to use this it for my case.
Aping the example from E. Brandstetter’s answer linked above, I arrive at:
WITH RECURSIVE cte AS ( SELECT s.sold_name, r.child_name, s.sold_quantity AS total FROM the_schema.sales s INNER JOIN the_schema.relationships_with_child_catetgory r ON s.sold_name = r.parent_name UNION ALL SELECT c.sold_name, r.child_name, (c.total * r.child_quantity) FROM cte c INNER JOIN the_schema.relationships_with_child_catetgory r ON r.parent_name = c.child_name ) SELECT * FROM cte
which gets part of the way there:
sold_name | child_name | total -----------+------------+------- parent1 | child1 | 1 parent1 | child1 | 2 parent1 | subChild1 | 10 parent1 | subChild1 | 20 parent1 | subChild2 | 2 parent1 | subChild2 | 4 (6 rows)
However, these results include undesired rows (the first two), and when I try to filter the CTE by adding where r.child_category = 'component'
to both parts, the query returns no rows:
sold_name | child_name | total -----------+------------+------- (0 rows)
and when I try to group/aggregate, it gives the following error:
ERROR: aggregate functions are not allowed in a recursive query's recursive term
I’m stuck on how to get the undesired rows filtered out and the aggregation happening; clearly I’m failing to comprehend how this recursive CTE works. All guidance is appreciated!
Advertisement
Answer
Basically you have the solution. If you stored the quantities and categories in your CTE as well, you can simply add a WHERE
filter and a SUM
aggregation afterwards:
SELECT child_name, SUM(sold_quantity * child_quantity) FROM cte WHERE category = 'component' GROUP BY child_name
My entire query looks like this (which only differs in the details I mentioned above from yours):
WITH RECURSIVE cte AS ( SELECT s.sold_name, s.sold_quantity, r.child_name, r.child_quantity, nc.thing_category as category FROM sales s JOIN relator r ON s.sold_name = r.parent_name JOIN names_categories nc ON r.child_name = nc.thing_name UNION ALL SELECT cte.sold_name, cte.sold_quantity, r.child_name, r.child_quantity, nc.thing_category FROM cte JOIN relator r ON cte.child_name = r.parent_name JOIN names_categories nc ON r.child_name = nc.thing_name ) SELECT child_name, SUM(sold_quantity * child_quantity) FROM cte WHERE category = 'component' GROUP BY child_name
Note: I didn’t use your view, because I found it more handy to fetch the data from directly from the tables instead of joining data I already have. But that’s just the way I personally like it 🙂