Let’s say I have a table real_table
with a primary key on column1.
I can load data into it like this:
insert into real_table SELECT * FROM EXTERNAL ( (col1 VARCHAR2 (50), col2 VARCHAR2 (50), col3 VARCHAR2 (50), col4 VARCHAR2 (50) ) TYPE ORACLE_LOADER DEFAULT DIRECTORY data_to_input ACCESS PARAMETERS ( RECORDS DELIMITED BY NEWLINE skip 1 FIELDS TERMINATED BY ',' MISSING FIELD VALUES ARE NULL ) LOCATION ('data.CSV') REJECT LIMIT UNLIMITED) ext where not exists ( select * from real_table r where r.col1 = ext.col1 );
But this means I have to specify the PKs manually. Doing it manually is not a huge problem for a relatively small table. I can also do this:
insert into real_table SELECT * FROM EXTERNAL ( (col1 VARCHAR2 (50), col2 VARCHAR2 (50), col3 VARCHAR2 (50), col4 VARCHAR2 (50) ) TYPE ORACLE_LOADER DEFAULT DIRECTORY data_to_input ACCESS PARAMETERS ( RECORDS DELIMITED BY NEWLINE skip 1 FIELDS TERMINATED BY ',' MISSING FIELD VALUES ARE NULL ) LOCATION ('data.CSV') REJECT LIMIT UNLIMITED) ext where not exists ( select * from real_table r where r.col1 = ext.col1 and r.col2 = ext.col2 and r.col3 = ext.col3 and r.col4 = ext.col4 );
But what do I do for a table that has 10 or 20 columns? Would my solution be ridiculous for a large table? I’m not complaining about having to type things out, i’m wondering if the processing required to check every record would be ridiculous. Is my solution some kind of classic noob mistake? How do people ensure that their records are unique?
Also, if I try to enter one record at a time, and it’s a duplicate, my database will just give that duplicate its own unique PK. (I have the PK columns defined as NUMBER GENERATED BY DEFAULT ON NULL AS IDENTITY
)
I’ve been saying to myself this whole time that “the PK will ensure that records are unique! The PK will solve all my problems and also cure cancer!” I can’t believe it took me so long to realize that those unique records could still be duplicates of each other.
Advertisement
Answer
The query you have used to insert into the real_table is easily the best way to load bulk data into the database..
insert into dest_table select <cols> from source_table where exists (select * from dest_table where <equate on unique keys>
This is because the database sees this a single straight SQL and doesnt go into inspecting each row for duplicates+insert a single row followed by the next row and so on and on and on.
For ensuring uniqueness, you would only need to compare the unique columns in real_table with the corresponding columns in the external table, it would be very unusual if you have a flat file with all of the columns as unique though.