I have a migration which updates existing records with a new attribute value. The model is called ‘MyRecord’. It has millions of records in the database with a new unit_id column of null. I want to update that unit_id column with a specific value:
MyRecord.find_each do |record| unit_id = Unit.calculate_unit_from_old_columns(record.legacy_column_1, record.legacy_column_2).first.id record.update unit_id: unit_id end
This creates a lot of N+1 queries:
SELECT units.* FROM units WHERE units.item_1 = 'Electronics' AND units.item_2 = 'Auto' UPDATE my_records SET unit_id='43' WHERE legacy_column_1 = 'Legacy Electronics' AND legacy_column_2 = 'Legacy Auto';
And some of these N+1 queries are duplicated. I see a lot of this in logs:
SELECT units.* FROM units WHERE units.item_1 = 'Electronics' AND units.item_2 = 'Auto' SELECT units.* FROM units WHERE units.item_1 = 'Electronics' AND units.item_2 = 'Auto'
I am familiar with eager loading via includes. But when this migration is run to update existing data, there will be no association yet. So I cannot do this:
record.includes(:unit)
How can I eliminate the N+1 queries and cache the query so it does not hit database again when s duplicate query?
Advertisement
Answer
Use a simple query, you can think of batching it if it runs for too long:
MyRecord.connection.execute( "UPDATE my_records, units SET unit_id = units.id WHERE units.item_1 = legacy_column_1 AND units.item_2 = legacy_column_2" )