I have the following query to execute:
UPDATE scenario_group SET project_id = @projectId WHERE scenario_group_id = @scenarioGroupId AND @projectId IN (SELECT project_id FROM project);";
To me that looks like it would evaluate the @projectId IN (SELECT project_id FROM PROJECT)
for every matching row. That would be bad. On the other hand, if the optimizer is smart it would see that the projects table never changes and therefore only do that check once.
This is the EXPLAIN. That looks like it would be doing the IN a lot. But I’m not sure how to interpret it.
id parent notused detail 3 0 0 SEARCH TABLE scenario_group USING INTEGER PRIMARY KEY (rowid=?) 8 0 0 USING ROWID SEARCH ON TABLE project FOR IN-OPERATOR
Do I need to rewrite this update? If so, how would I write this UPDATE to explicitly check for the existence of the project_id
in advance?
Advertisement
Answer
EXISTS
is often more efficient than IN
. So I would just write the logic as:
UPDATE scenario_group SET project_id = @projectId WHERE scenario_group_id = @scenarioGroupId AND EXISTS (SELECT 1 FROM project p WHERE p.project_id = @projectId);
In particular, this will make use of an index on project(project_id)
— which seems quite likely given that project_id
should be the primary key.