Skip to content
Advertisement

How can I load only unique data into my oracle table without specifying PK manually?

Let’s say I have a table real_table with a primary key on column1.

I can load data into it like this:

insert into real_table
SELECT * FROM EXTERNAL (
    (col1 VARCHAR2 (50),
     col2 VARCHAR2 (50),
     col3 VARCHAR2 (50),
     col4 VARCHAR2 (50) )

    TYPE ORACLE_LOADER
    DEFAULT DIRECTORY data_to_input
    ACCESS PARAMETERS (
        RECORDS DELIMITED BY NEWLINE
        skip 1
        FIELDS TERMINATED BY ','
        MISSING FIELD VALUES ARE NULL 
        ) 
    LOCATION ('data.CSV')
    REJECT LIMIT UNLIMITED) ext
    where not exists (
        select * from real_table r
        where r.col1 = ext.col1
    );

But this means I have to specify the PKs manually. Doing it manually is not a huge problem for a relatively small table. I can also do this:

insert into real_table
SELECT * FROM EXTERNAL (
    (col1 VARCHAR2 (50),
     col2 VARCHAR2 (50),
     col3 VARCHAR2 (50),
     col4 VARCHAR2 (50) )

    TYPE ORACLE_LOADER
    DEFAULT DIRECTORY data_to_input
    ACCESS PARAMETERS (
        RECORDS DELIMITED BY NEWLINE
        skip 1
        FIELDS TERMINATED BY ','
        MISSING FIELD VALUES ARE NULL 
        ) 
    LOCATION ('data.CSV')
    REJECT LIMIT UNLIMITED) ext
    where not exists (
        select * from real_table r
        where r.col1 = ext.col1
        and r.col2 = ext.col2
        and r.col3 = ext.col3
        and r.col4 = ext.col4
    );

But what do I do for a table that has 10 or 20 columns? Would my solution be ridiculous for a large table? I’m not complaining about having to type things out, i’m wondering if the processing required to check every record would be ridiculous. Is my solution some kind of classic noob mistake? How do people ensure that their records are unique?

Also, if I try to enter one record at a time, and it’s a duplicate, my database will just give that duplicate its own unique PK. (I have the PK columns defined as NUMBER GENERATED BY DEFAULT ON NULL AS IDENTITY)

I’ve been saying to myself this whole time that “the PK will ensure that records are unique! The PK will solve all my problems and also cure cancer!” I can’t believe it took me so long to realize that those unique records could still be duplicates of each other.

Advertisement

Answer

The query you have used to insert into the real_table is easily the best way to load bulk data into the database..

insert 
  into dest_table
 select <cols>
  from source_table
 where exists (select * from dest_table where <equate on unique keys>

This is because the database sees this a single straight SQL and doesnt go into inspecting each row for duplicates+insert a single row followed by the next row and so on and on and on.

For ensuring uniqueness, you would only need to compare the unique columns in real_table with the corresponding columns in the external table, it would be very unusual if you have a flat file with all of the columns as unique though.

User contributions licensed under: CC BY-SA
5 People found this is helpful
Advertisement