The following PostgreSQL query
UPDATE table_A A SET is_active = false FROM table_A WHERE A.parent_id IS NULL AND A.is_active = true AND A.id = ANY (SELECT (B.parent_id) FROM table_A B INNER JOIN table_B ON table_A.foreign_id = table_B.id WHERE table_B.deleted = true);
gets stuck loading endlessly. I know correlated sub-queries can take long, but a SELECT using the same parameters worked quickly and returned the desired results. And I have a small data set that I let run for an entire day just to make sure it wouldn’t eventually work with time.
Table_A uses hierarchical data structures and only a specific level of hierarchy has a foreign key that I can use to join and check the second table. The idea is :
Find all rows in Table_A whose associated Table_B row has its “deleted” value set to true.
From this set of results get the parent_id column
For any row in table_A whose id is part of the parent_id column, so for all parents, check if their is_active is true and if so make it false.
The EXPLAIN:
Update on table_A A (cost=0.00..3906658758867.89 rows=89947680 width=192) -> Nested Loop (cost=0.00..3906658758867.89 rows=89947680 width=192) Join Filter: (SubPlan 1) -> Seq Scan on table_A (cost=0.00..37899.20 rows=410720 width=14) -> Materialize (cost=0.00..37901.39 rows=438 width=185) -> Seq Scan on table_A A (cost=0.00..37899.20 rows=438 width=185) Filter: ((parent_id IS NULL) AND is_active) SubPlan 1 -> Nested Loop (cost=0.00..42405.74 rows=410720 width=8) -> Seq Scan on table_B (cost=0.00..399.34 rows=1 width=0) Filter: (deleted AND (table_A.foreign_id = id)) -> Seq Scan on table_A B (cost=0.00..37899.20 rows=410720 width=8) JIT: Functions: 17 " Options: Inlining true, Optimization true, Expressions true, Deforming true"
Advertisement
Answer
Sometimes proper use of aliases can make a difference.
Compare the following 2 query plans.
The first is the original query run on sample data.
cost=0.00..314144.00 rows=4975
Estimated to update 4975 rows on a table of less than 10 rows?
The second is a slightly modified version of the first.
cost=92.26..122.20 rows=2
EXPLAIN UPDATE table_A A SET is_active = false FROM table_A WHERE A.parent_id IS NULL AND A.is_active = true AND A.id = ANY ( SELECT (B.parent_id) FROM table_A B INNER JOIN table_B ON table_A.foreign_id = table_B.id WHERE table_B.deleted = true );| QUERY PLAN | | :--------------------------------------------------------------------------------------------- | | Update on table_a a (cost=0.00..314144.00 rows=4975 width=25) | | -> Nested Loop (cost=0.00..314144.00 rows=4975 width=25) | | Join Filter: (SubPlan 1) | | -> Seq Scan on table_a (cost=0.00..29.90 rows=1990 width=10) | | -> Materialize (cost=0.00..29.93 rows=5 width=18) | | -> Seq Scan on table_a a (cost=0.00..29.90 rows=5 width=18) | | Filter: ((parent_id IS NULL) AND is_active) | | SubPlan 1 | | -> Nested Loop (cost=0.15..57.97 rows=1990 width=4) | | -> Index Scan using table_b_pkey on table_b (cost=0.15..8.17 rows=1 width=0) | | Index Cond: (table_a.foreign_id = id) | | Filter: deleted | | -> Seq Scan on table_a b (cost=0.00..29.90 rows=1990 width=4) |
EXPLAIN UPDATE table_A SET is_active = false WHERE parent_id IS NULL AND is_active = true AND id = ANY ( SELECT a2.parent_id FROM table_A a2 JOIN table_B b ON a2.foreign_id = b.id WHERE b.deleted = true );| QUERY PLAN | | :----------------------------------------------------------------------------------------------- | | Update on table_a (cost=92.26..122.20 rows=2 width=31) | | -> Hash Join (cost=92.26..122.20 rows=2 width=31) | | Hash Cond: (table_a.id = a2.parent_id) | | -> Seq Scan on table_a (cost=0.00..29.90 rows=5 width=18) | | Filter: ((parent_id IS NULL) AND is_active) | | -> Hash (cost=89.76..89.76 rows=200 width=16) | | -> HashAggregate (cost=87.76..89.76 rows=200 width=16) | | Group Key: a2.parent_id | | -> Hash Join (cost=50.14..85.27 rows=995 width=16) | | Hash Cond: (a2.foreign_id = b.id) | | -> Seq Scan on table_a a2 (cost=0.00..29.90 rows=1990 width=14) | | -> Hash (cost=34.70..34.70 rows=1235 width=10) | | -> Seq Scan on table_b b (cost=0.00..34.70 rows=1235 width=10) | | Filter: deleted |
The second query only uses a few more alias names.
The update statement can also be written as a join on a sub-query.
UPDATE table_A AS parent SET is_active = false FROM ( SELECT child.parent_id FROM table_A AS child JOIN table_B AS dream ON child.foreign_id = dream.id WHERE child.parent_id IS NOT NULL AND dream.deleted = true GROUP BY child.parent_id ) dreamless WHERE parent.id = dreamless.parent_id AND parent.parent_id IS NULL AND parent.is_active = true;1 rows affected
SELECT * FROM table_Aid | parent_id | is_active | foreign_id -: | --------: | :-------- | ---------: 2 | 1 | t | 2 3 | 1 | t | 5 4 | null | t | 3 5 | 3 | t | 4 1 | null | f | 1
Test on db<>fiddle here