I have a category hierarchy that products are attached to. That category hierarchy is saved as an adjacency list. Products can be attached to any category nodes at any level. The category hierarchy is a tree.
I would like to…
- get the name of every level 3 category…
- per product…
- where that product is attached to any level 3 category node…
- or a descendant of a level 3 node.
I know I can materialize the hierarchy, and from that I’ve been able to satisfy all requirements but the last. I always lose some products or categories.
Given
CREATE TABLE product (p_id varchar PRIMARY KEY); CREATE TABLE category (c_id varchar PRIMARY KEY, parent_c_id varchar); CREATE TABLE product_category ( p_id varchar, c_id varchar, PRIMARY KEY (p_id, c_id), FOREIGN KEY (p_id) REFERENCES product (p_id) ON UPDATE CASCADE ON DELETE CASCADE, FOREIGN KEY (c_id) REFERENCES category (c_id) ON UPDATE CASCADE ON DELETE CASCADE ); INSERT INTO product (p_id) VALUES ('p_01'), ('p_02'), ('p_03'), ('p_04'), ('p_05'); INSERT INTO category (c_id, parent_c_id) VALUES ('c_0_1', NULL), -- L1 ('c_1_1', 'c_0_1'), ('c_1_2', 'c_0_1'), ('c_1_3', 'c_0_1'), -- L2 ('c_2_1', 'c_1_1'), ('c_2_2', 'c_1_1'), ('c_2_3', 'c_1_2'), ('c_2_4', 'c_1_3'), -- L3 ('c_3_1', 'c_2_1'), ('c_3_2', 'c_2_2'), ('c_3_3', 'c_2_3'), ('c_3_4', 'c_2_4'), -- L4 ('c_4_1', 'c_3_1'), ('c_4_2', 'c_3_2'), ('c_4_3', 'c_3_3'), ('c_4_4', 'c_3_4'); INSERT INTO product_category (p_id, c_id) VALUES -- p_01 explicitly attached to every level in path 1; include. ('p_01', 'c_0_1'), ('p_01', 'c_2_1'), ('p_01', 'c_3_1'), ('p_01', 'c_4_1'), -- p_02 explicitly attached to desired level in paths 1 and 3; include both. ('p_02', 'c_3_3'), ('p_02', 'c_3_4'), -- p_03 explicitly attached to super-level in path 3; exclude. ('p_03', 'c_2_4'), -- p_04 explicitly attached to sub-level in path 1, -- transitively to desired level in path 1; include. ('p_04', 'c_4_2'); -- p_05 not attached at all.
I would like to end up with something like
p_id | c_id ------+---------------- p_01 | {c_3_1} p_02 | {c_3_3, c_3_4} p_04 | {c_3_2} (3 rows)
but the closest I have gotten is
WITH RECURSIVE category_tree (c_id, parent_c_id, depth, path) AS ( SELECT c_id, parent_c_id, 0 AS depth, ARRAY[]::varchar[] FROM category WHERE parent_c_id IS NULL UNION ALL SELECT c.c_id, c.parent_c_id, ct.depth + 1, path || c.c_id FROM category_tree AS ct INNER JOIN category AS c ON c.parent_c_id = ct.c_id ) SELECT * INTO TEMP TABLE t_category_path FROM category_tree; SELECT p.p_id, ARRAY_AGG(c_id) category_names FROM product AS p, (SELECT DISTINCT t1.c_id, p_id FROM product_category AS pc INNER JOIN t_category_path AS t1 ON pc.c_id = t1.c_id WHERE t1.depth = 3 ORDER BY c_id) x WHERE p.p_id = x.p_id GROUP BY p.p_id;
p_id | category_names ------+---------------- p_01 | {c_3_1} p_02 | {c_3_4,c_3_3} (2 rows)
- The order of categories is irrelevent (I want a set, not a list).
- I can tolerate duplicate categories far better than missing categories or products.
- I have some liberty to adjust the schema.
> select version(); version -------------------------------------------------------------------------------------------------------------- PostgreSQL 10.12 on x86_64-redhat-linux-gnu, compiled by gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-39), 64-bit
Advertisement
Answer
WITH RECURSIVE cte AS ( SELECT c_id, parent_c_id, 0 as level, NULL AS level3_category FROM category WHERE parent_c_id IS NULL UNION SELECT c.c_id, cte.parent_c_id, cte.level + 1, CASE -- 1 WHEN cte.level + 1 = 3 THEN c.c_id ELSE cte.level3_category END FROM category c JOIN cte ON c.parent_c_id = cte.c_id ) SELECT p_id, ARRAY_AGG(DISTINCT level3_category) as c_id -- 2 FROM cte JOIN product_category pc ON cte.c_id = pc.c_id AND cte.level3_category IS NOT NULL GROUP BY p_id
- This
CASE
clause stores the current name if and only if it is level 3. If it is less, than it returnsNULL
, if it is greater, it takes the level 3 value. DISTINCT
is allowed inGROUP BY
aggregates to eliminate non-distinct values.