Skip to content
Advertisement

Names of nodes at depth d for every descendant leaf

I have a category hierarchy that products are attached to. That category hierarchy is saved as an adjacency list. Products can be attached to any category nodes at any level. The category hierarchy is a tree.

I would like to…

  1. get the name of every level 3 category…
  2. per product…
  3. where that product is attached to any level 3 category node…
  4. or a descendant of a level 3 node.

I know I can materialize the hierarchy, and from that I’ve been able to satisfy all requirements but the last. I always lose some products or categories.

Given

CREATE TABLE product (p_id varchar PRIMARY KEY);
CREATE TABLE category (c_id varchar PRIMARY KEY, parent_c_id varchar);
CREATE TABLE product_category (
    p_id varchar,
    c_id varchar,
    PRIMARY KEY (p_id, c_id),
    FOREIGN KEY (p_id) REFERENCES product (p_id)
        ON UPDATE CASCADE ON DELETE CASCADE,
    FOREIGN KEY (c_id) REFERENCES category (c_id)
        ON UPDATE CASCADE ON DELETE CASCADE
);
                                                                       
INSERT INTO product (p_id) VALUES
('p_01'),
('p_02'),
('p_03'),
('p_04'),
('p_05');
                                                                       
INSERT INTO category (c_id, parent_c_id) VALUES
('c_0_1', NULL),
-- L1
('c_1_1', 'c_0_1'),
('c_1_2', 'c_0_1'),
('c_1_3', 'c_0_1'),
-- L2
('c_2_1', 'c_1_1'),
('c_2_2', 'c_1_1'),
('c_2_3', 'c_1_2'),
('c_2_4', 'c_1_3'),
-- L3
('c_3_1', 'c_2_1'),
('c_3_2', 'c_2_2'),
('c_3_3', 'c_2_3'),
('c_3_4', 'c_2_4'),
-- L4
('c_4_1', 'c_3_1'),
('c_4_2', 'c_3_2'),
('c_4_3', 'c_3_3'),
('c_4_4', 'c_3_4');
                                                                       
INSERT INTO product_category (p_id, c_id) VALUES
-- p_01 explicitly attached to every level in path 1; include.
('p_01', 'c_0_1'),
('p_01', 'c_2_1'),
('p_01', 'c_3_1'),
('p_01', 'c_4_1'),
-- p_02 explicitly attached to desired level in paths 1 and 3; include both.
('p_02', 'c_3_3'),
('p_02', 'c_3_4'),
-- p_03 explicitly attached to super-level in path 3; exclude.
('p_03', 'c_2_4'),
-- p_04 explicitly attached to sub-level in path 1,
-- transitively to desired level in path 1; include.
('p_04', 'c_4_2');
-- p_05 not attached at all.

I would like to end up with something like

 p_id |      c_id
------+----------------
 p_01 | {c_3_1}
 p_02 | {c_3_3, c_3_4}
 p_04 | {c_3_2}
(3 rows)

but the closest I have gotten is

WITH RECURSIVE category_tree (c_id, parent_c_id, depth, path) AS (
    SELECT c_id, parent_c_id, 0 AS depth, ARRAY[]::varchar[]
    FROM category
    WHERE parent_c_id IS NULL
    UNION ALL
    SELECT c.c_id, c.parent_c_id, ct.depth + 1, path || c.c_id
    FROM category_tree AS ct
    INNER JOIN category AS c ON c.parent_c_id = ct.c_id
)
SELECT *
INTO TEMP TABLE t_category_path
FROM category_tree;

SELECT p.p_id, ARRAY_AGG(c_id) category_names
FROM product AS p,
   (SELECT DISTINCT t1.c_id, p_id
    FROM product_category AS pc
    INNER JOIN t_category_path AS t1 ON pc.c_id = t1.c_id
    WHERE t1.depth = 3
    ORDER BY c_id) x
WHERE p.p_id = x.p_id
GROUP BY p.p_id;
 p_id | category_names
------+----------------
 p_01 | {c_3_1}
 p_02 | {c_3_4,c_3_3}
(2 rows)
  • The order of categories is irrelevent (I want a set, not a list).
  • I can tolerate duplicate categories far better than missing categories or products.
  • I have some liberty to adjust the schema.
> select version();
                                                   version
--------------------------------------------------------------------------------------------------------------
 PostgreSQL 10.12 on x86_64-redhat-linux-gnu, compiled by gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-39), 64-bit

Advertisement

Answer

step-by-step demo:db<>fiddle

WITH RECURSIVE cte AS (
    SELECT c_id, parent_c_id, 0 as level, NULL AS level3_category
    FROM category
    WHERE parent_c_id IS NULL
    
    UNION
    
    SELECT
        c.c_id,
        cte.parent_c_id,
        cte.level + 1,
        CASE                                                  -- 1
            WHEN cte.level + 1 = 3 THEN c.c_id
            ELSE cte.level3_category
        END
    FROM
        category c
    JOIN 
        cte
    ON c.parent_c_id = cte.c_id
)
SELECT 
    p_id,
    ARRAY_AGG(DISTINCT level3_category) as c_id               -- 2
FROM 
    cte
JOIN 
    product_category pc
ON cte.c_id = pc.c_id AND cte.level3_category IS NOT NULL
GROUP BY p_id
  1. This CASE clause stores the current name if and only if it is level 3. If it is less, than it returns NULL, if it is greater, it takes the level 3 value.
  2. DISTINCT is allowed in GROUP BY aggregates to eliminate non-distinct values.
User contributions licensed under: CC BY-SA
1 People found this is helpful
Advertisement