I’ve recently been wondering what’s generally the best approach for ensuring that the creation of database records is idempotent. The two approaches I can think of are:
- Checking whether a record already exists before executing an INSERT
- Using a unique constraint on the relevant columns to insure two records with the same values cannot exist
This seems like an example of a look-before-you-leap/easier-to-ask-forgiveness-than-permission dichotomy. In the Python community, I know that the latter approach is acceptable or even preferred. I wonder if the same applies to working with a relational database.
Is faster better?
Based on some testing below, it seems that the EAFP approach with the unique constraint is faster for both inserting new records and gracefully dealing with duplicate records. However, I can imagine situations where the LBYL approach with a SELECT before every INSERT might be preferable.
- If the table schema changed, it could be tricky to update the constraint to include a new column. It’s certainly easier to change code in production than migrate the database.
- If the table contained many millions of records, adding and removing indexes in production could be tricky.
- The string-search approach I used to try to avoid failing silently on the wrong exceptions
create_permission_EAFP
in my Django example looks hacky. (Although this could say more about my implementation than the general approach).
Performance test
The below test was run with Postgres 14, and Django 3.2 in Docker containers on my laptop. I decided to use the Django test framework for this as each run of the tests starts with an empty DB.
Results for creating 10,000 records
Output from running the tests in tests.py
for ten thousand records:
====================================================================== FAIL: test_look_before_you_leap_faster_existing_records (idempodentinserts.tests.TestPermissionCreation) ---------------------------------------------------------------------- Traceback (most recent call last): File "/code/mainapp/idempodentinserts/tests.py", line 82, in test_look_before_you_leap_faster_existing_records self.assertLess(duration_LBYL, duration_EAFP, f"LBYL took longer with existing records. {report}") AssertionError: 4.998060464859009 not less than 2.5420615673065186 : LBYL took longer with existing records. For 10000 create calls... The Look-before-you-leap strategy took 4.998 seconds (average: 0.500 milliseconds). The Ask-forgiveness-not-permission strategy took 2.542 seconds (average: 0.254 milliseconds). ====================================================================== FAIL: test_look_before_you_leap_faster_new_records (idempodentinserts.tests.TestPermissionCreation) ---------------------------------------------------------------------- Traceback (most recent call last): File "/code/mainapp/idempodentinserts/tests.py", line 103, in test_look_before_you_leap_faster_new_records self.assertLess(duration_LBYL, duration_EAFP, f"LBYL took longer with new records. {report}") AssertionError: 31.07089853286743 not less than 20.387959241867065 : LBYL took longer with new records. For 10000 create calls... The Look-before-you-leap strategy took 31.071 seconds (average: 3.107 milliseconds). The Ask-forgiveness-not-permission strategy took 20.388 seconds (average: 2.039 milliseconds). ---------------------------------------------------------------------- Ran 4 tests in 122.848s FAILED (failures=2)
Results for creating 1,000,000 records
Even when creating a million records, the test supports EAFP as the faster approach. While all inserts got slower, checking for the existence of records first didn’t help.
====================================================================== FAIL: test_look_before_you_leap_faster_existing_records (idempodentinserts.tests.TestPermissionCreation) ---------------------------------------------------------------------- Traceback (most recent call last): File "/code/mainapp/idempodentinserts/tests.py", line 82, in test_look_before_you_leap_faster_existing_records self.assertLess(duration_LBYL, duration_EAFP, f"LBYL took longer with existing records. {report}") AssertionError: 445.97691440582275 not less than 247.20186638832092 : LBYL took longer with existing records. For 1000000 create calls... The Look-before-you-leap strategy took 445.977 seconds (average: 0.446 milliseconds). The Ask-forgiveness-not-permission strategy took 247.202 seconds (average: 0.247 milliseconds). ====================================================================== FAIL: test_look_before_you_leap_faster_new_records (idempodentinserts.tests.TestPermissionCreation) ---------------------------------------------------------------------- Traceback (most recent call last): File "/code/mainapp/idempodentinserts/tests.py", line 103, in test_look_before_you_leap_faster_new_records self.assertLess(duration_LBYL, duration_EAFP, f"LBYL took longer with new records. {report}") AssertionError: 6323.6987335681915 not less than 4435.961817026138 : LBYL took longer with new records. For 1000000 create calls... The Look-before-you-leap strategy took 6323.699 seconds (average: 6.324 milliseconds). The Ask-forgiveness-not-permission strategy took 4435.962 seconds (average: 4.436 milliseconds). ---------------------------------------------------------------------- Ran 4 tests in 23372.856s FAILED (failures=2)
Code
models.py
from django.db import models class Permission(models.Model): subject_uuid = models.UUIDField(db_index=True) object_uuid = models.UUIDField(db_index=True) verb = models.CharField(max_length=255) class UniquePermission(models.Model): subject_uuid = models.UUIDField(db_index=True) object_uuid = models.UUIDField(db_index=True) verb = models.CharField(max_length=255) class Meta: unique_together = ("subject_uuid", "object_uuid","verb")
permissions.py
from django import db import psycopg2 from . import models def create_permission_LBYL(subject_uuid, object_uuid, verb): if not models.Permission.objects.filter( subject_uuid=subject_uuid, object_uuid=object_uuid, verb=verb ).exists(): models.Permission.objects.create( subject_uuid=subject_uuid, object_uuid=object_uuid, verb=verb ) def create_permission_EAFP(subject_uuid, object_uuid, verb): try: models.UniquePermission.objects.create( subject_uuid=subject_uuid, object_uuid=object_uuid, verb=verb ) except db.IntegrityError as e: # This hack wouldn't work if Postgres wasn't the database backend if not isinstance(e.__cause__, psycopg2.errors.UniqueViolation): raise e
tests.py
from django import test from . import permissions from . import models VERB_LENGTH = 10 THOUSAND = 1000 VERB_COUNT = 10 * THOUSAND class TestPermissionCreation(test.TransactionTestCase): """ Compares performance between the LBYL (look before you leap) and EAFP (it's easier to ask forgiveness than permission) approaches to idempotent database inserts. """ @classmethod def setUpClass(cls): super().setUpClass() letter_combinations = itertools.combinations(string.ascii_lowercase, VERB_LENGTH) unique_words = ("".join(combination) for combination in letter_combinations) cls.verbs = list(itertools.islice(unique_words, VERB_COUNT)) cls.existing_subject_uuids = [uuid.uuid4() for _ in cls.verbs] cls.existing_object_uuids = [uuid.uuid4() for _ in cls.verbs] cls.new_subject_uuids = [uuid.uuid4() for _ in cls.verbs] cls.new_object_uuids = [uuid.uuid4() for _ in cls.verbs] def setUp(self): models.Permission.objects.bulk_create( models.Permission(subject_uuid=sub, object_uuid=obj, verb=verb) for sub, obj, verb in zip(self.existing_subject_uuids, self.existing_object_uuids, self.verbs) ) models.UniquePermission.objects.bulk_create( models.UniquePermission(subject_uuid=sub, object_uuid=obj, verb=verb) for sub, obj, verb in zip(self.existing_subject_uuids, self.existing_object_uuids, self.verbs) ) def report_durations(self, duration_LBYL, duration_EAFP): verb_count = len(self.verbs) LBYL_ave_ms = (duration_LBYL / verb_count) * 1000 EAFP_ave_ms = (duration_EAFP / verb_count) * 1000 return ( f"For {verb_count} create calls... " f"The Look-before-you-leap strategy took {duration_LBYL:.3f} seconds " f"(average: {LBYL_ave_ms:.3f} milliseconds). " f"The Ask-forgiveness-not-permission strategy took {duration_EAFP:.3f} seconds " f"(average: {EAFP_ave_ms:.3f} milliseconds)." ) def test_look_before_you_leap_faster_existing_records(self): start_LBYL = time.time() for sub, obj, verb in zip(self.existing_subject_uuids, self.existing_object_uuids, self.verbs): permissions.create_permission_LBYL(subject_uuid=sub, object_uuid=obj, verb=verb) duration_LBYL = time.time() - start_LBYL start_EAFP = time.time() for sub, obj, verb in zip(self.existing_subject_uuids, self.existing_object_uuids, self.verbs): permissions.create_permission_EAFP(subject_uuid=sub, object_uuid=obj, verb=verb) duration_EAFP = time.time() - start_EAFP report = self.report_durations(duration_EAFP=duration_EAFP,duration_LBYL=duration_LBYL) self.assertLess(duration_LBYL, duration_EAFP, f"LBYL took longer with existing records. {report}") def test_look_before_you_leap_faster_new_records(self): start_LBYL = time.time() for sub, obj, verb in zip(self.new_subject_uuids, self.new_object_uuids, self.verbs): permissions.create_permission_LBYL(subject_uuid=sub, object_uuid=obj, verb=verb) duration_LBYL = time.time() - start_LBYL start_EAFP = time.time() for sub, obj, verb in zip(self.new_subject_uuids, self.new_object_uuids, self.verbs): permissions.create_permission_EAFP(subject_uuid=sub, object_uuid=obj, verb=verb) duration_EAFP = time.time() - start_EAFP report = self.report_durations(duration_EAFP=duration_EAFP,duration_LBYL=duration_LBYL) self.assertLess(duration_LBYL, duration_EAFP, f"LBYL took longer with new records. {report}") def test_ask_forgiveness_not_permission_faster_existing_records(self): start_LBYL = time.time() for sub, obj, verb in zip(self.existing_subject_uuids, self.existing_object_uuids, self.verbs): permissions.create_permission_LBYL(subject_uuid=sub, object_uuid=obj, verb=verb) duration_LBYL = time.time() - start_LBYL start_EAFP = time.time() for sub, obj, verb in zip(self.existing_subject_uuids, self.existing_object_uuids, self.verbs, ): permissions.create_permission_EAFP(subject_uuid=sub, object_uuid=obj, verb=verb) duration_EAFP = time.time() - start_EAFP report = self.report_durations(duration_EAFP=duration_EAFP,duration_LBYL=duration_LBYL) self.assertLess(duration_EAFP, duration_LBYL, f"LBYL took longer with existing records. {report}") def test_ask_forgiveness_not_permission_faster_new_records(self): start_LBYL = time.time() for sub, obj, verb in zip(self.new_subject_uuids, self.new_object_uuids, self.verbs): permissions.create_permission_LBYL(subject_uuid=sub, object_uuid=obj, verb=verb) duration_LBYL = time.time() - start_LBYL start_EAFP = time.time() for sub, obj, verb in zip(self.new_subject_uuids, self.new_object_uuids, self.verbs): permissions.create_permission_EAFP(subject_uuid=sub, object_uuid=obj, verb=verb) duration_EAFP = time.time() - start_EAFP report = self.report_durations(duration_EAFP=duration_EAFP,duration_LBYL=duration_LBYL) self.assertLess(duration_EAFP, duration_LBYL, f"LBYL took longer with new records. {report}")
Advertisement
Answer
Relational databases are all about guarantees. If you choose to not use their features (UNIQUE CONSTRAINT in this case) you choose not to have that guarantee.
Is faster better?
Better than correctness? No. You want a correct application that runs as fast as possible, not a fast application that is correct “most of the time”. Using database guarantees makes it easy to write correct applications.
If the table contained many millions of records, adding and removing indexes in production could be tricky.
Removing an index is no problem at all. Creating an index might lock the table for writes, yes. But you can build the index concurrently to avoid that.
If the table schema changed, it could be tricky to update the constraint to include a new column.
Just create a new constraint and after that drop the old one. When that is done update the application code to use the new column.