I created a table in postgres that includes various columns that based on an import from a csv. Since the csv can be re-uploaded with some changes and I don’t want the re-upload to create duplicates on rows that didn’t change, I added a constraint and unique index with majority of the columns included in them. Is there any performance concerns to doing this? Is there a better way to solving this?
CREATE TABLE IF NOT EXISTS table_a ( id SERIAL PRIMARY KEY, table_b_id SERIAL REFERENCES table_b(id) ON DELETE CASCADE, column_a INT, column_b INT, column_c BOOLEAN, created_time BIGINT NOT NULL, CONSTRAINT "table_a_key" UNIQUE ("table_b_id", "column_a", "column_b", "column_c") );
Advertisement
Answer
Adding an index always imposes a performance penalty for INSERT, UPDATE and DELETE statements as the index needs to be maintained.
However, in most cases the added overhead doesn’t really matter and might be out-weighted by the improved performance when SELECTing data. The performance overhead is more clearly visible when doing bulk loads compared to single row operations.
But if you want to prevent duplicates, you don’t have a choice to begin with.
The only reliable way to prevent duplicates is to create a unique constraint (or index) as you did. It’s also the only way you can make use of an “upsert” using insert ... on conflict ...
So, yes there is a performance penalty to pay. And no, there is no better way to ensure uniqueness.