Update of many rows with join extremely slow

I have a table with five relevant fields – id, source, iid, track_hash, alias. I want to group all entries into groups with a common track_hash and then for each row save the id of the row with the lowest source (with ties broken in favor of the highest iid) entry from its group into the alias field. To do this I wrote the following query:

with best as 
(SELECT id as bid, track_hash FROM
    (SELECT id, track_hash,
        RANK () OVER ( 
            PARTITION BY track_hash
            ORDER BY source asc, iid DESC
        ) rank
        from albums
    )
    where rank = 1
)
select bid, a.* from albums a inner join best
on a.track_hash = best.track_hash

​x
 
with best as (SELECT id as bid, track_hash FROM    (SELECT id, track_hash,        RANK () OVER (             PARTITION BY track_hash            ORDER BY source asc, iid DESC        ) rank        from albums    )    where rank = 1)select bid, a.* from albums a inner join beston a.track_hash = best.track_hash​

This takes a completely reasonable 2 seconds on 24k rows. Now, instead of simply seeing this id, I want to actually save it. For this, I used the following very similar query:

with best as 
(SELECT id as bid, track_hash FROM
    (SELECT id, track_hash,
        RANK () OVER ( 
            PARTITION BY track_hash
            ORDER BY source asc, iid DESC
        ) rank
        from albums
    )
    where rank = 1
)
update albums
set alias = bid FROM albums a inner join best
on a.track_hash = best.track_hash

 
with best as (SELECT id as bid, track_hash FROM    (SELECT id, track_hash,        RANK () OVER (             PARTITION BY track_hash            ORDER BY source asc, iid DESC        ) rank        from albums    )    where rank = 1)update albumsset alias = bid FROM albums a inner join beston a.track_hash = best.track_hash​

However, this one takes anywhere between 1 and 10 minutes, and I really don’t understand why. Doesn’t the engine have to match every row to its best.id/alias anyway, which is exactly what I’m doing with my update? Why is this happening and what am I doing wrong?

Query plan looks like this:

MATERIALIZE 1
CO-ROUTINE 4
SCAN TABLE albums USING INDEX track_hash_idx
USE TEMP B-TREE FOR RIGHT PART OF ORDER BY
SCAN SUBQUERY 4
SCAN TABLE albums USING COVERING INDEX track_hash_idx
SEARCH SUBQUERY 1 USING AUTOMATIC PARTIAL COVERING INDEX (rank=?)
SEARCH TABLE albums AS a USING COVERING INDEX track_hash_idx (track_hash=?)

 
MATERIALIZE 1CO-ROUTINE 4SCAN TABLE albums USING INDEX track_hash_idxUSE TEMP B-TREE FOR RIGHT PART OF ORDER BYSCAN SUBQUERY 4SCAN TABLE albums USING COVERING INDEX track_hash_idxSEARCH SUBQUERY 1 USING AUTOMATIC PARTIAL COVERING INDEX (rank=?)SEARCH TABLE albums AS a USING COVERING INDEX track_hash_idx (track_hash=?)​

Answer

You don’t need the join to albums (again).
The UPDATE ... FROM syntax provides actually an implicit join of albums to best:

UPDATE albums AS a
SET alias = b.bid 
FROM best AS b
WHERE a.track_hash = b.track_hash

 
UPDATE albums AS aSET alias = b.bid FROM best AS bWHERE a.track_hash = b.track_hash​

Advertisement

Answer