Skip to content
Advertisement

Idempotent record creation: is it better to use a unique constraint or check for existence before inserting a record?

I’ve recently been wondering what’s generally the best approach for ensuring that the creation of database records is idempotent. The two approaches I can think of are:

  • Checking whether a record already exists before executing an INSERT
  • Using a unique constraint on the relevant columns to insure two records with the same values cannot exist

This seems like an example of a look-before-you-leap/easier-to-ask-forgiveness-than-permission dichotomy. In the Python community, I know that the latter approach is acceptable or even preferred. I wonder if the same applies to working with a relational database.

Is faster better?

Based on some testing below, it seems that the EAFP approach with the unique constraint is faster for both inserting new records and gracefully dealing with duplicate records. However, I can imagine situations where the LBYL approach with a SELECT before every INSERT might be preferable.

  • If the table schema changed, it could be tricky to update the constraint to include a new column. It’s certainly easier to change code in production than migrate the database.
  • If the table contained many millions of records, adding and removing indexes in production could be tricky.
  • The string-search approach I used to try to avoid failing silently on the wrong exceptions create_permission_EAFP in my Django example looks hacky. (Although this could say more about my implementation than the general approach).

Performance test

The below test was run with Postgres 14, and Django 3.2 in Docker containers on my laptop. I decided to use the Django test framework for this as each run of the tests starts with an empty DB.

Results for creating 10,000 records

Output from running the tests in tests.py for ten thousand records:

Results for creating 1,000,000 records

Even when creating a million records, the test supports EAFP as the faster approach. While all inserts got slower, checking for the existence of records first didn’t help.

Code

models.py

permissions.py

tests.py

Advertisement

Answer

Relational databases are all about guarantees. If you choose to not use their features (UNIQUE CONSTRAINT in this case) you choose not to have that guarantee.

Is faster better?

Better than correctness? No. You want a correct application that runs as fast as possible, not a fast application that is correct “most of the time”. Using database guarantees makes it easy to write correct applications.

If the table contained many millions of records, adding and removing indexes in production could be tricky.

Removing an index is no problem at all. Creating an index might lock the table for writes, yes. But you can build the index concurrently to avoid that.

If the table schema changed, it could be tricky to update the constraint to include a new column.

Just create a new constraint and after that drop the old one. When that is done update the application code to use the new column.

User contributions licensed under: CC BY-SA
4 People found this is helpful
Advertisement