I have a MySQL database with 2 tables products
and product_variants
. A product has many product variants. Here a sample:
products +----+------+ | id | name | +----+------+ | 1 | Foo | | 2 | Bar | +----+------+ product_variants +----+-------------+--------+ | id | product_id | value | +----+-------------+--------+ | 10 | 1 | red | | 11 | 1 | green | | 12 | 1 | blue | | 13 | 2 | red | | 14 | 2 | yellow | +----+-------------+--------+
Now I need to bulk insert a lot of products with their variants in the most efficient and fastest way. I have a JSON with many products (100k+) like this:
[ { "name": "Foo", "variants": [{ "value": "red" }, { "value": "green" }, { "value": "blue" }] }, { "name": "Bar", "variants": [{ "value": "red" }, { "value": "yellow" }] }, ... ]
from which I should generate a query to insert the products.
My idea is to use an insert
query like this:
INSERT INTO `products` (name) VALUES ("foo"), ("bar"), ...;
But then I don’t know what product_id
(foreign key) to use in the insert query for product_variants
:
INSERT INTO `product_variants` (product_id,value) VALUES (?,"red"), (?,"green"), ...;
(these queries inside a transaction)
I’ve thought to specify the product ids manually, in increment way from the last id, but I get errors when concurrency connection insert products at the same time or when 2 or more bulk insert processes run concurrently.
What strategy can I use to achieve my goal? Is there a standard way to do this?
ps: if possible I would not want to change the structure of the 2 tables.
Advertisement
Answer
Finally, I’ve used a strategy that uses the MySQL function LAST_INSERT_ID()
like @sticky-bit sad but using bulk insert (1 insert for many products) that is much faster.
I attach a simple Ruby script to perform bulk insertions. All seems works well also with concurrency insertions.
I’ve run the script with the flag innodb_autoinc_lock_mode = 2
and all seems good, but I don’t know if is necessary to set the flag to 1:
require 'active_record' require 'benchmark' require 'mysql2' require 'securerandom' ActiveRecord::Base.establish_connection( adapter: 'mysql2', host: 'localhost', username: 'root', database: 'test', pool: 200 ) class ApplicationRecord < ActiveRecord::Base self.abstract_class = true end class Product < ApplicationRecord has_many :product_variants end class ProductVariant < ApplicationRecord belongs_to :product COLORS = %w[red blue green yellow pink orange].freeze end def migrate ActiveRecord::Schema.define do create_table(:products) do |t| t.string :name end create_table(:product_variants) do |t| t.references :product, null: false, foreign_key: true t.string :color end end end def generate_data d = [] 100_000.times do d << { name: SecureRandom.alphanumeric(8), product_variants: Array.new(rand(1..3)).map do { color: ProductVariant::COLORS.sample } end } end d end DATA = generate_data.freeze def bulk_insert # All inside a transaction ActiveRecord::Base.transaction do # Insert products values = DATA.map { |row| "('#{row[:name]}')" }.join(',') q = "INSERT INTO products (name) VALUES #{values}" ActiveRecord::Base.connection.execute(q) # Get last insert id q = 'SELECT LAST_INSERT_ID()' last_id, = ActiveRecord::Base.connection.execute(q).first # Insert product variants i = -1 values = DATA.map do |row| i += 1 row[:product_variants].map { |subrow| "(#{last_id + i},'#{subrow[:color]}')" } end.flatten.join(',') q = "INSERT INTO product_variants (product_id,color) VALUES #{values}" ActiveRecord::Base.connection.execute(q) end end migrate threads = [] # Spawn 100 threads that perform 200 single inserts each 100.times do threads << Thread.new do 200.times do Product.create(name: 'CONCURRENCY NOISE') end end end threads << Thread.new do Benchmark.bm do |benchmark| benchmark.report('Bulk') do bulk_insert end end end threads.map(&:join)
After running the script I’ve checked that all products have associated variants with the query
SELECT * FROM products LEFT OUTER JOIN product_variants ON (products.id = product_variants.product_id) WHERE product_variants.product_id IS NULL AND name != "CONCURRENCY NOISE";
and correctly I get no rows.