Skip to content
Advertisement

Why does this SQL query get stuck in an endless loop?

The following PostgreSQL query

UPDATE table_A A 
SET is_active = false 
FROM table_A 
WHERE A.parent_id IS NULL AND A.is_active = true AND A.id = ANY 
(SELECT (B.parent_id) 
FROM table_A B 
INNER JOIN table_B ON table_A.foreign_id = table_B.id 
WHERE table_B.deleted = true);

gets stuck loading endlessly. I know correlated sub-queries can take long, but a SELECT using the same parameters worked quickly and returned the desired results. And I have a small data set that I let run for an entire day just to make sure it wouldn’t eventually work with time.

Table_A uses hierarchical data structures and only a specific level of hierarchy has a foreign key that I can use to join and check the second table. The idea is :

  1. Find all rows in Table_A whose associated Table_B row has its “deleted” value set to true.

  2. From this set of results get the parent_id column

  3. For any row in table_A whose id is part of the parent_id column, so for all parents, check if their is_active is true and if so make it false.

The EXPLAIN:

Update on table_A A  (cost=0.00..3906658758867.89 rows=89947680 width=192)  ->  Nested Loop  (cost=0.00..3906658758867.89 rows=89947680 width=192)
    Join Filter: (SubPlan 1)
    ->  Seq Scan on table_A  (cost=0.00..37899.20 rows=410720 width=14)
    ->  Materialize  (cost=0.00..37901.39 rows=438 width=185)
          ->  Seq Scan on table_A A  (cost=0.00..37899.20 rows=438 width=185)
                Filter: ((parent_id IS NULL) AND is_active)
    SubPlan 1
      ->  Nested Loop  (cost=0.00..42405.74 rows=410720 width=8)
            ->  Seq Scan on table_B  (cost=0.00..399.34 rows=1 width=0)
                  Filter: (deleted AND (table_A.foreign_id = id))
            ->  Seq Scan on table_A B  (cost=0.00..37899.20 rows=410720 width=8) 
JIT:  Functions: 17 "  Options: Inlining true, Optimization true, Expressions true, Deforming true"

Advertisement

Answer

Sometimes proper use of aliases can make a difference.

Compare the following 2 query plans.
The first is the original query run on sample data.
cost=0.00..314144.00 rows=4975 Estimated to update 4975 rows on a table of less than 10 rows?

The second is a slightly modified version of the first.
cost=92.26..122.20 rows=2

EXPLAIN
UPDATE table_A A SET is_active = false 
FROM table_A 
WHERE A.parent_id IS NULL 
  AND A.is_active = true 
  AND A.id = ANY (
    SELECT (B.parent_id) 
    FROM table_A B 
    INNER JOIN table_B ON table_A.foreign_id = table_B.id 
    WHERE table_B.deleted = true
);
| QUERY PLAN                                                                                     |
| :--------------------------------------------------------------------------------------------- |
| Update on table_a a  (cost=0.00..314144.00 rows=4975 width=25)                                 |
|   ->  Nested Loop  (cost=0.00..314144.00 rows=4975 width=25)                                   |
|         Join Filter: (SubPlan 1)                                                               |
|         ->  Seq Scan on table_a  (cost=0.00..29.90 rows=1990 width=10)                         |
|         ->  Materialize  (cost=0.00..29.93 rows=5 width=18)                                    |
|               ->  Seq Scan on table_a a  (cost=0.00..29.90 rows=5 width=18)                    |
|                     Filter: ((parent_id IS NULL) AND is_active)                                |
|         SubPlan 1                                                                              |
|           ->  Nested Loop  (cost=0.15..57.97 rows=1990 width=4)                                |
|                 ->  Index Scan using table_b_pkey on table_b  (cost=0.15..8.17 rows=1 width=0) |
|                       Index Cond: (table_a.foreign_id = id)                                    |
|                       Filter: deleted                                                          |
|                 ->  Seq Scan on table_a b  (cost=0.00..29.90 rows=1990 width=4)                |
EXPLAIN 
UPDATE table_A SET is_active = false 
WHERE parent_id IS NULL 
  AND is_active = true 
  AND id = ANY (
    SELECT a2.parent_id
    FROM table_A a2 
    JOIN table_B b ON a2.foreign_id = b.id 
    WHERE b.deleted = true
  );
| QUERY PLAN                                                                                       |
| :----------------------------------------------------------------------------------------------- |
| Update on table_a  (cost=92.26..122.20 rows=2 width=31)                                          |
|   ->  Hash Join  (cost=92.26..122.20 rows=2 width=31)                                            |
|         Hash Cond: (table_a.id = a2.parent_id)                                                   |
|         ->  Seq Scan on table_a  (cost=0.00..29.90 rows=5 width=18)                              |
|               Filter: ((parent_id IS NULL) AND is_active)                                        |
|         ->  Hash  (cost=89.76..89.76 rows=200 width=16)                                          |
|               ->  HashAggregate  (cost=87.76..89.76 rows=200 width=16)                           |
|                     Group Key: a2.parent_id                                                      |
|                     ->  Hash Join  (cost=50.14..85.27 rows=995 width=16)                         |
|                           Hash Cond: (a2.foreign_id = b.id)                                      |
|                           ->  Seq Scan on table_a a2  (cost=0.00..29.90 rows=1990 width=14)      |
|                           ->  Hash  (cost=34.70..34.70 rows=1235 width=10)                       |
|                                 ->  Seq Scan on table_b b  (cost=0.00..34.70 rows=1235 width=10) |
|                                       Filter: deleted                                            |

The second query only uses a few more alias names.

The update statement can also be written as a join on a sub-query.

UPDATE table_A AS parent
   SET is_active = false 
FROM (
   SELECT child.parent_id
   FROM table_A AS child 
   JOIN table_B AS dream
     ON child.foreign_id = dream.id
   WHERE child.parent_id IS NOT NULL
     AND dream.deleted = true
   GROUP BY child.parent_id
) dreamless 
WHERE parent.id = dreamless.parent_id
  AND parent.parent_id IS NULL
  AND parent.is_active = true;
1 rows affected
SELECT * FROM table_A
id | parent_id | is_active | foreign_id
-: | --------: | :-------- | ---------:
 2 |         1 | t         |          2
 3 |         1 | t         |          5
 4 |      null | t         |          3
 5 |         3 | t         |          4
 1 |      null | f         |          1

Test on db<>fiddle here

User contributions licensed under: CC BY-SA
3 People found this is helpful
Advertisement