I have a MySQL database with 2 tables products
and product_variants
. A product has many product variants. Here a sample:
products
+----+------+
| id | name |
+----+------+
| 1 | Foo |
| 2 | Bar |
+----+------+
product_variants
+----+-------------+--------+
| id | product_id | value |
+----+-------------+--------+
| 10 | 1 | red |
| 11 | 1 | green |
| 12 | 1 | blue |
| 13 | 2 | red |
| 14 | 2 | yellow |
+----+-------------+--------+
Now I need to bulk insert a lot of products with their variants in the most efficient and fastest way. I have a JSON with many products (100k+) like this:
[
{
"name": "Foo",
"variants": [{ "value": "red" }, { "value": "green" }, { "value": "blue" }]
},
{
"name": "Bar",
"variants": [{ "value": "red" }, { "value": "yellow" }]
},
]
from which I should generate a query to insert the products.
My idea is to use an insert
query like this:
INSERT INTO `products` (name) VALUES ("foo"), ("bar"), ;
But then I don’t know what product_id
(foreign key) to use in the insert query for product_variants
:
INSERT INTO `product_variants` (product_id,value) VALUES (?,"red"), (?,"green"), ;
(these queries inside a transaction)
I’ve thought to specify the product ids manually, in increment way from the last id, but I get errors when concurrency connection insert products at the same time or when 2 or more bulk insert processes run concurrently.
What strategy can I use to achieve my goal? Is there a standard way to do this?
ps: if possible I would not want to change the structure of the 2 tables.
Advertisement
Answer
Finally, I’ve used a strategy that uses the MySQL function LAST_INSERT_ID()
like @sticky-bit sad but using bulk insert (1 insert for many products) that is much faster.
I attach a simple Ruby script to perform bulk insertions. All seems works well also with concurrency insertions.
I’ve run the script with the flag innodb_autoinc_lock_mode = 2
and all seems good, but I don’t know if is necessary to set the flag to 1:
require 'active_record'
require 'benchmark'
require 'mysql2'
require 'securerandom'
ActiveRecord::Base.establish_connection(
adapter: 'mysql2',
host: 'localhost',
username: 'root',
database: 'test',
pool: 200
)
class ApplicationRecord < ActiveRecord::Base
self.abstract_class = true
end
class Product < ApplicationRecord
has_many :product_variants
end
class ProductVariant < ApplicationRecord
belongs_to :product
COLORS = %w[red blue green yellow pink orange].freeze
end
def migrate
ActiveRecord::Schema.define do
create_table(:products) do |t|
t.string :name
end
create_table(:product_variants) do |t|
t.references :product, null: false, foreign_key: true
t.string :color
end
end
end
def generate_data
d = []
100_000.times do
d << {
name: SecureRandom.alphanumeric(8),
product_variants: Array.new(rand(1..3)).map do
{ color: ProductVariant::COLORS.sample }
end
}
end
d
end
DATA = generate_data.freeze
def bulk_insert
# All inside a transaction
ActiveRecord::Base.transaction do
# Insert products
values = DATA.map { |row| "('#{row[:name]}')" }.join(',')
q = "INSERT INTO products (name) VALUES #{values}"
ActiveRecord::Base.connection.execute(q)
# Get last insert id
q = 'SELECT LAST_INSERT_ID()'
last_id, = ActiveRecord::Base.connection.execute(q).first
# Insert product variants
i = -1
values = DATA.map do |row|
i += 1
row[:product_variants].map { |subrow| "(#{last_id + i},'#{subrow[:color]}')" }
end.flatten.join(',')
q = "INSERT INTO product_variants (product_id,color) VALUES #{values}"
ActiveRecord::Base.connection.execute(q)
end
end
migrate
threads = []
# Spawn 100 threads that perform 200 single inserts each
100.times do
threads << Thread.new do
200.times do
Product.create(name: 'CONCURRENCY NOISE')
end
end
end
threads << Thread.new do
Benchmark.bm do |benchmark|
benchmark.report('Bulk') do
bulk_insert
end
end
end
threads.map(&:join)
After running the script I’ve checked that all products have associated variants with the query
SELECT *
FROM products
LEFT OUTER JOIN product_variants
ON (products.id = product_variants.product_id)
WHERE product_variants.product_id IS NULL
AND name != "CONCURRENCY NOISE";
and correctly I get no rows.