I have PostgreSQL DB table user_book_details with 451007 records. The user_book_details table is getting populated on daily basis with around 1K new records.
I have the following query that is taking a long time(13 Hrs) to complete every time.
update user_book_details as A1 set min_date= (select min(A2.acc_date) as min_date from user_book_details A2 where A2.user_id=A1.user_id and A2.book_id=A1.book_id) where A1.min_date is null;
How I can rewrite the query to improve the performance? FYI, there is no index on user_id and book_id column.
Advertisement
Answer
Your query is okay:
update user_book_details ubd set min_date = (select min(ubd2.acc_date) from user_book_details ubd2 where ubd2.user_id = ubd.user_id and ubd2.book_id = ubd.book_id ) where ubd.min_date is null;
For performance you want an index on user_book_details(user_id, book_id)
. I also think it would be faster written like this:
update user_book_details ubd set min_date = min_acc_date from (select ubd2.user_id, ubd2.book_id, min(ubd2.acc_date) as min_acc_date from user_book_details ubd2 group by ubd2.user_id, ubd2.book_id ) ubd2 where ubd2.user_id = ubd.user_id and ubd2.book_id = ubd.book_id and ubd.min_date is null;
The first method uses the index to look up the values for each row (something that might be a little complicated when updating the same query). The second method aggregates the data and then joins in the values.
I should note that this value is easily calculated on the fly:
select ubd.*, min(acc_date) over (partition by user_id, book_id) as min_acc_date from user_book_details ubd;
This might be preferable to trying to keep it up-to-date in the table.