I am currently facing an issue in our old database(postgres 9.4) table which contains some duplicate rows. I want to ensure that no more duplicate rows should be generated.
But I also want to keep the duplicate rows that already has been generated. Due to which I could not apply unique constraint on those columns(multiple column).
I have created a trigger which would check the row if already exists and raise exception accordingly. But it is also failing when concurrent transactions are in processing.
Example :
TAB1 col1 | col2 | col3 | ------------------------------------ 1 | A | B | -- 2 | A | B | -- already present duplicates for column col2 and col3(allowed) 3 | C | D | INSERT INTO TAB1 VALUES(4 , 'A' , 'B') ; -- This insert statement will not be allowed.
Note: I cannot use on conflict due to older version of database.
Advertisement
Answer
Presumably, you don’t want new rows to duplicate historical rows. If so, you can do this but it requires modifying the table and adding a new column.
alter table t add duplicate_seq int default 1;
Then update this column to identify existing duplicates:
update t set duplicate_seq = seqnum from (select t.*, row_number() over (partition by col order by col) as seqnum from t ) tt where t.<primary key> = tt.<primary key>;
Now, create a unique index or constraint:
alter table t add constraint unq_t_col_seq on t(col, duplicate_seq);
When you insert rows, do not provide a value for duplicate_seq
. The default is 1
. That will conflict with any existing values — or with duplicates entered more recently. Historical duplicates will be allowed.