I have a category hierarchy that products are attached to. That category hierarchy is saved as an adjacency list. Products can be attached to any category nodes at any level. The category hierarchy is a tree.
I would like to…
- get the name of every level 3 category…
- per product…
- where that product is attached to any level 3 category node…
- or a descendant of a level 3 node.
I know I can materialize the hierarchy, and from that I’ve been able to satisfy all requirements but the last. I always lose some products or categories.
Given
x
CREATE TABLE product (p_id varchar PRIMARY KEY);
CREATE TABLE category (c_id varchar PRIMARY KEY, parent_c_id varchar);
CREATE TABLE product_category (
p_id varchar,
c_id varchar,
PRIMARY KEY (p_id, c_id),
FOREIGN KEY (p_id) REFERENCES product (p_id)
ON UPDATE CASCADE ON DELETE CASCADE,
FOREIGN KEY (c_id) REFERENCES category (c_id)
ON UPDATE CASCADE ON DELETE CASCADE
);
INSERT INTO product (p_id) VALUES
('p_01'),
('p_02'),
('p_03'),
('p_04'),
('p_05');
INSERT INTO category (c_id, parent_c_id) VALUES
('c_0_1', NULL),
-- L1
('c_1_1', 'c_0_1'),
('c_1_2', 'c_0_1'),
('c_1_3', 'c_0_1'),
-- L2
('c_2_1', 'c_1_1'),
('c_2_2', 'c_1_1'),
('c_2_3', 'c_1_2'),
('c_2_4', 'c_1_3'),
-- L3
('c_3_1', 'c_2_1'),
('c_3_2', 'c_2_2'),
('c_3_3', 'c_2_3'),
('c_3_4', 'c_2_4'),
-- L4
('c_4_1', 'c_3_1'),
('c_4_2', 'c_3_2'),
('c_4_3', 'c_3_3'),
('c_4_4', 'c_3_4');
INSERT INTO product_category (p_id, c_id) VALUES
-- p_01 explicitly attached to every level in path 1; include.
('p_01', 'c_0_1'),
('p_01', 'c_2_1'),
('p_01', 'c_3_1'),
('p_01', 'c_4_1'),
-- p_02 explicitly attached to desired level in paths 1 and 3; include both.
('p_02', 'c_3_3'),
('p_02', 'c_3_4'),
-- p_03 explicitly attached to super-level in path 3; exclude.
('p_03', 'c_2_4'),
-- p_04 explicitly attached to sub-level in path 1,
-- transitively to desired level in path 1; include.
('p_04', 'c_4_2');
-- p_05 not attached at all.
I would like to end up with something like
p_id | c_id
------+----------------
p_01 | {c_3_1}
p_02 | {c_3_3, c_3_4}
p_04 | {c_3_2}
(3 rows)
but the closest I have gotten is
WITH RECURSIVE category_tree (c_id, parent_c_id, depth, path) AS (
SELECT c_id, parent_c_id, 0 AS depth, ARRAY[]::varchar[]
FROM category
WHERE parent_c_id IS NULL
UNION ALL
SELECT c.c_id, c.parent_c_id, ct.depth + 1, path || c.c_id
FROM category_tree AS ct
INNER JOIN category AS c ON c.parent_c_id = ct.c_id
)
SELECT *
INTO TEMP TABLE t_category_path
FROM category_tree;
SELECT p.p_id, ARRAY_AGG(c_id) category_names
FROM product AS p,
(SELECT DISTINCT t1.c_id, p_id
FROM product_category AS pc
INNER JOIN t_category_path AS t1 ON pc.c_id = t1.c_id
WHERE t1.depth = 3
ORDER BY c_id) x
WHERE p.p_id = x.p_id
GROUP BY p.p_id;
p_id | category_names
------+----------------
p_01 | {c_3_1}
p_02 | {c_3_4,c_3_3}
(2 rows)
- The order of categories is irrelevent (I want a set, not a list).
- I can tolerate duplicate categories far better than missing categories or products.
- I have some liberty to adjust the schema.
> select version();
version
--------------------------------------------------------------------------------------------------------------
PostgreSQL 10.12 on x86_64-redhat-linux-gnu, compiled by gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-39), 64-bit
Advertisement
Answer
WITH RECURSIVE cte AS (
SELECT c_id, parent_c_id, 0 as level, NULL AS level3_category
FROM category
WHERE parent_c_id IS NULL
UNION
SELECT
c.c_id,
cte.parent_c_id,
cte.level + 1,
CASE -- 1
WHEN cte.level + 1 = 3 THEN c.c_id
ELSE cte.level3_category
END
FROM
category c
JOIN
cte
ON c.parent_c_id = cte.c_id
)
SELECT
p_id,
ARRAY_AGG(DISTINCT level3_category) as c_id -- 2
FROM
cte
JOIN
product_category pc
ON cte.c_id = pc.c_id AND cte.level3_category IS NOT NULL
GROUP BY p_id
- This
CASE
clause stores the current name if and only if it is level 3. If it is less, than it returnsNULL
, if it is greater, it takes the level 3 value. DISTINCT
is allowed inGROUP BY
aggregates to eliminate non-distinct values.