Skip to content
Advertisement

Avoid N+1 query in large data set

I have a migration which updates existing records with a new attribute value. The model is called ‘MyRecord’. It has millions of records in the database with a new unit_id column of null. I want to update that unit_id column with a specific value:

MyRecord.find_each do |record|
  unit_id = Unit.calculate_unit_from_old_columns(record.legacy_column_1, record.legacy_column_2).first.id
  record.update unit_id: unit_id
end

This creates a lot of N+1 queries:

SELECT units.* FROM units WHERE units.item_1 = 'Electronics' AND units.item_2 = 'Auto' 
UPDATE my_records SET unit_id='43' WHERE legacy_column_1 = 'Legacy Electronics' AND legacy_column_2 = 'Legacy Auto';

And some of these N+1 queries are duplicated. I see a lot of this in logs:

SELECT units.* FROM units WHERE units.item_1 = 'Electronics' AND units.item_2 = 'Auto' 
SELECT units.* FROM units WHERE units.item_1 = 'Electronics' AND units.item_2 = 'Auto' 

I am familiar with eager loading via includes. But when this migration is run to update existing data, there will be no association yet. So I cannot do this:

record.includes(:unit)

How can I eliminate the N+1 queries and cache the query so it does not hit database again when s duplicate query?

Advertisement

Answer

Use a simple query, you can think of batching it if it runs for too long:

MyRecord.connection.execute(
  "UPDATE my_records, units 
   SET unit_id = units.id 
   WHERE units.item_1 = legacy_column_1 AND units.item_2 = legacy_column_2"
)
User contributions licensed under: CC BY-SA
5 People found this is helpful
Advertisement