Skip to content
Advertisement

PostgreSQL CTE UPDATE-FROM query skips rows

2 tables

table_1 rows: NOTE: id 2 has two rows

-----------------------
| id | counts | track |
-----------------------
| 1  | 10     | 1     |
| 2  | 10     | 2     |
| 2  | 10     | 3     |
-----------------------

table_2 rows

---------------
| id | counts |
---------------
| 1  | 0      |
| 2  | 0      |
---------------

Query:

with t1_rows as (
    select id, sum(counts) as counts, track 
    from table_1
    group by id, track
)
update table_2 set counts = (coalesce(table_2.counts, 0) + t1.counts)::float 
from t1_rows t1
where table_2.id = t1.id;

select * from table_2;

When i ran above query i got table_2 output as

---------------
| id | counts |
---------------
| 1  | 10     |
| 2  | 10     | (expected counts as 20 but got 10)
---------------

I noticed that above update query is considering only 1st match and skipping rest.

I can make it work by changing the query like below. Now the table_2 updates as expected since there are no duplicate rows from table_1.

But i would like to know why my previous query is not working. Is there anything wrong in it?

with t1_rows as (
    select id, sum(counts) as counts, array_agg(track) as track 
    from table_1
    group by id
)
update table_2 set counts = (coalesce(table_2.counts, 0) + t1.counts)::float 
from t1_rows t1
where table_2.id = t1.id;

Schema

CREATE TABLE IF NOT EXISTS table_1(
  id varchar not null,
  counts integer not null,
  track integer not null
);

CREATE TABLE IF NOT EXISTS table_2(
  id varchar not null,
  counts integer not null
);

insert into table_1(id, counts, track) values(1, 10, 1), (2, 10, 2), (2, 10, 3);
insert into table_2(id, counts) values(1, 0), (2, 0);

Advertisement

Answer

The problem is that an UPDATE in PostgreSQL creates a new version of the row rather than changing the row in place, but the new row version is not visible in the snapshot of the current query. So from the point of view of the query, the row “vanishes” when it is updated the first time.

The documentation says:

When a FROM clause is present, what essentially happens is that the target table is joined to the tables mentioned in the from_list, and each output row of the join represents an update operation for the target table. When using FROM you should ensure that the join produces at most one output row for each row to be modified. In other words, a target row shouldn’t join to more than one row from the other table(s). If it does, then only one of the join rows will be used to update the target row, but which one will be used is not readily predictable.

User contributions licensed under: CC BY-SA
8 People found this is helpful
Advertisement