Skip to content
Advertisement

Raills: Get SQL generated by delete_all

I’m not particularly familiar with Ruby on Rails, but I’m troubleshooting an issue we’re experiencing with a rake job that is supposed to be cleaning database tables. The tables grow very large very quickly, and the query generated by ActiveRecord doesn’t seem to be efficient enough to handle it.

The Ruby calls looks like this:

Source.where("id not IN (#{Log.select('DISTINCT source_id').to_sql})").delete_all

and this:

Log.joins(:report).where(:report_id => Report.where(cond)).delete_all

I’m trying to get at the SQL, so we can have our DBA’s attempt to optimize it better. I’ve noticed if I drop the “.delete_all” I can add a “.to_sql” which gives me the SELECT statement of the query, prior to the call to “.delete_all”. I’d like to see what SQL is being generated by that delete_all method though.

Is there a way to do that?

Advertisement

Answer

From the fine manual:

delete_all(conditions = nil)

Deletes the records matching conditions without instantiating the records first, and hence not calling the destroy method nor invoking callbacks. This is a single SQL DELETE statement that goes straight to the database, much more efficient than destroy_all.

So a Model.delete_all(conditions) ends up as

delete from models where conditions

When you say Model.where(...).delete_all, the conditions for the delete_all come from the where calls so these are the same:

Model.delete_all(conditions)
Model.where(conditions).delete_all

Applying that to your case:

Source.where("id not IN (#{Log.select('DISTINCT source_id').to_sql})").delete_all

you should see that you’re running:

delete from sources
where id not in (
    select distinct source_id
    from logs
)

If you run your code in a development console you should see the SQL in the console or the Rails logs but it will be as above.

As far as optimization goes, my first step would be to drop the DISTINCT. DISTINCT usually isn’t cheap and IN doesn’t care about duplicates anyway so not in (select distinct ...) is probably pointless busy work. Then maybe an index on source_id would help, the query optimizer might be able to slurp the source_id list straight out of the index without having to do a table scan to find them. Of course, query optimization is a bit of a dark art so these simple steps may or may not work.

User contributions licensed under: CC BY-SA
5 People found this is helpful
Advertisement