Skip to content
Advertisement

Update of many rows with join extremely slow

I have a table with five relevant fields – id, source, iid, track_hash, alias. I want to group all entries into groups with a common track_hash and then for each row save the id of the row with the lowest source (with ties broken in favor of the highest iid) entry from its group into the alias field. To do this I wrote the following query:

with best as 
(SELECT id as bid, track_hash FROM
    (SELECT id, track_hash,
        RANK () OVER ( 
            PARTITION BY track_hash
            ORDER BY source asc, iid DESC
        ) rank
        from albums
    )
    where rank = 1
)
select bid, a.* from albums a inner join best
on a.track_hash = best.track_hash

This takes a completely reasonable 2 seconds on 24k rows. Now, instead of simply seeing this id, I want to actually save it. For this, I used the following very similar query:

with best as 
(SELECT id as bid, track_hash FROM
    (SELECT id, track_hash,
        RANK () OVER ( 
            PARTITION BY track_hash
            ORDER BY source asc, iid DESC
        ) rank
        from albums
    )
    where rank = 1
)
update albums
set alias = bid FROM albums a inner join best
on a.track_hash = best.track_hash

However, this one takes anywhere between 1 and 10 minutes, and I really don’t understand why. Doesn’t the engine have to match every row to its best.id/alias anyway, which is exactly what I’m doing with my update? Why is this happening and what am I doing wrong?

Query plan looks like this:

MATERIALIZE 1
CO-ROUTINE 4
SCAN TABLE albums USING INDEX track_hash_idx
USE TEMP B-TREE FOR RIGHT PART OF ORDER BY
SCAN SUBQUERY 4
SCAN TABLE albums USING COVERING INDEX track_hash_idx
SEARCH SUBQUERY 1 USING AUTOMATIC PARTIAL COVERING INDEX (rank=?)
SEARCH TABLE albums AS a USING COVERING INDEX track_hash_idx (track_hash=?)

Advertisement

Answer

You don’t need the join to albums (again).
The UPDATE ... FROM syntax provides actually an implicit join of albums to best:

UPDATE albums AS a
SET alias = b.bid 
FROM best AS b
WHERE a.track_hash = b.track_hash
User contributions licensed under: CC BY-SA
4 People found this is helpful
Advertisement