Skip to content
Advertisement

Postgresql update query taking too long to complete every time

I have PostgreSQL DB table user_book_details with 451007 records. The user_book_details table is getting populated on daily basis with around 1K new records.

I have the following query that is taking a long time(13 Hrs) to complete every time.

    update user_book_details as A1 set min_date=
    (select min(A2.acc_date) as min_date from user_book_details A2 where A2.user_id=A1.user_id 
     and A2.book_id=A1.book_id) where A1.min_date is null;

How I can rewrite the query to improve the performance? FYI, there is no index on user_id and book_id column.

Advertisement

Answer

Your query is okay:

update user_book_details ubd
    set min_date = (select min(ubd2.acc_date) 
                    from user_book_details ubd2
                    where ubd2.user_id = ubd.user_id and
                          ubd2.book_id = ubd.book_id
                   )
   where ubd.min_date is null;

For performance you want an index on user_book_details(user_id, book_id). I also think it would be faster written like this:

update user_book_details ubd
    set min_date = min_acc_date
    from (select ubd2.user_id, ubd2.book_id, min(ubd2.acc_date) as min_acc_date
          from user_book_details ubd2
          group by ubd2.user_id, ubd2.book_id
         ) ubd2
    where ubd2.user_id = ubd.user_id and
          ubd2.book_id = ubd.book_id and
          ubd.min_date is null;

The first method uses the index to look up the values for each row (something that might be a little complicated when updating the same query). The second method aggregates the data and then joins in the values.

I should note that this value is easily calculated on the fly:

select ubd.*,
       min(acc_date) over (partition by user_id, book_id) as min_acc_date
from user_book_details ubd;

This might be preferable to trying to keep it up-to-date in the table.

User contributions licensed under: CC BY-SA
5 People found this is helpful
Advertisement