Input selected row and ignore duplicate row

I have a problem that I need to insert a selected row from the table to another table based on last_update data or last_transaction_product. last_update is the data updated from the table, and last_transaction_product is the latest transaction from my shop, and then my row has duplicate data but has a different last_update column here, this duplicate data must not be inserted to my destination table here, because of the data has already inserted on destination table.

For example, I have 3 tables here.

Source Table AS Table_Source
Destination Table AS Table_Destination
Log Table AS Table_Log

My problem is, my Source Table and Destination Table didn’t have any PK that can be used for the identity of the Row table.

So here are my Table_Source

product_name	shop_name	last_transaction_product	last_update
Clock	Benefical	2020-04-15	2020-06-01
Monitor	Hinezo Computer	2020-01-17	2020-06-01
Notebook	LA Shop	2020-03-25	2020-06-01

I have a query that inserts it to Table_Destination based on > @last_transaction_product or > @last_update, here are my Table_Destination after inserted from Table_Source, and I’m inserted it at 2020-06-02

product_name	shop_name	last_transaction_product	last_update
Clock	Benefical	2020-04-15	2020-06-01
Monitor	Hinezo Computer	2020-01-17	2020-06-01
Notebook	LA Shop	2020-03-25	2020-06-01

After it inserted into my Table_Destination, it will save the record from the latest activity table to Table_Log, so here my Table_Log after inserted.

table_name	last_transaction_product	last_update	last_run
Table_Source	2020-04-15	2020-06-01	2020-06-02

And the next time, there was new data inserted at Table_Source

product_name	shop_name	last_transaction_product	last_update
Clock	Benefical	2020-04-15	2020-06-01
Monitor	Hinezo Computer	2020-01-17	2020-06-01
Notebook	LA Shop	2020-03-25	2020-06-01
Poster	Maniac Shop	2020-04-27	2020-06-03
Clock	Benefical	2020-04-15	2020-06-03

As you can see from the newest table of Table_Source there was a duplicated data ('Clock', 'Benefical', '2020-04-15', '2020-06-03') that has a different last_update. In my case, I want to excluded this duplicated data to insert into my Table_Destination, so it will only insert the ('Poster', 'Maniac Shop', '2020-04-27', '2020-06-03'), and of course I used latest last_transaction_product and last_update from Table_Log as my parameter to inserting into Table_Destination . So my Table_Destination and my Table_Log will be like this.

product_name	shop_name	last_transaction_product	last_update
Clock	Benefical	2020-04-15	2020-06-01
Monitor	Hinezo Computer	2020-01-17	2020-06-01
Notebook	LA Shop	2020-03-25	2020-06-01
Poster	Maniac Shop	2020-04-27	2020-06-03

table_name	last_transaction_product	last_update	last_run
Table_Source	2020-04-15	2020-06-01	2020-06-02
Table_Source	2020-04-27	2020-06-03	2020-06-04

So here are my T-SQL now

BEGIN
    -- INSERT DESTINATION DATA --
    SET Table_Destination ON;
    INSERT INTO 
        Table_Destination
    SELECT 
        *
    FROM 
        Table_Source
    WHERE 
        last_transaction_product > (SELECT MAX(last_transaction_product) FROM Table_Log WHERE table_name = 'Table_Source')
        OR
        last_update > (SELECT MAX(last_update) FROM Table_Log WHERE table_name = 'Table_Source')

    SET Table_Destination OFF;

    -- INSERT LOG UPDATE --
    INSERT INTO Table_Log(table_name, last_transaction_product, last_update, last_run)
    VALUES('Table_Source', (SELECT MAX(last_transaction_product) FROM Table_Source WHERE table_name = 'Table_Source'), (SELECT MAX(last_update) FROM Table_Source WHERE table_name = 'Table_Source'),  @Date)
END;

​x
 
BEGIN    -- INSERT DESTINATION DATA --    SET Table_Destination ON;    INSERT INTO         Table_Destination    SELECT         *    FROM         Table_Source    WHERE         last_transaction_product > (SELECT MAX(last_transaction_product) FROM Table_Log WHERE table_name = 'Table_Source')        OR        last_update > (SELECT MAX(last_update) FROM Table_Log WHERE table_name = 'Table_Source')​    SET Table_Destination OFF;​    -- INSERT LOG UPDATE --    INSERT INTO Table_Log(table_name, last_transaction_product, last_update, last_run)    VALUES('Table_Source', (SELECT MAX(last_transaction_product) FROM Table_Source WHERE table_name = 'Table_Source'), (SELECT MAX(last_update) FROM Table_Source WHERE table_name = 'Table_Source'),  @Date)END;​

Note:

This is the flow for inserting of my Data Data Inserting Flow
I don’t want this logic using the NOT IN or NOT EXISTS query, because in my case I have 10 million rows from the Table_Source and Table_Destination. Because it will make my database work too hard.

Answer

You should use NOT EXISTS, it is your best option because of the semi-join.

If you have problems with millions rows you should check your indexes. Table_Destination hahas no duplicates for product_name so indexing it (in both tables) is a good start and a join on it should not be terrific.

Also, there is something strange in your WHERE clause..

I will try this

EDIT

Added a condition in NOT EXISTS to get a product inserted in destination once per shop instead of only once.

Added also protection for 1st run when Table_Log is empty.

SET Table_Destination ON;

INSERT INTO 
    Table_Destination
SELECT 
    *
FROM 
    Table_Source ts,
    (
        SELECT 
            MAX(last_transaction_product) max_last_transaction_product, 
            MAX(last_update) max_last_update 
        FROM Table_Log 
        WHERE table_name = 'Table_Source'
    ) agg
WHERE 
    (
        last_transaction_product > ISNULL(max_last_transaction_product, 0)
        OR 
        last_update > ISNULL(max_last_update, 0)
    )
    AND
    NOT EXISTS (
        SELECT NULL 
        FROM Table_Destination td
        WHERE td.product_name = ts.product_name
        /* REMOVE BELOW LINE IF YOU NEED PRODUCTS IN DEST ONLY ONCE
        KEEP THE LINE IF YOU WANT PRODUCT REPEATED ONCE FOR EACH SHOP */
        AND td.shop_name = ts.shop_name 
    )

SET Table_Destination OFF;

 
SET Table_Destination ON;​INSERT INTO     Table_DestinationSELECT     *FROM     Table_Source ts,    (        SELECT             MAX(last_transaction_product) max_last_transaction_product,             MAX(last_update) max_last_update         FROM Table_Log         WHERE table_name = 'Table_Source'    ) aggWHERE     (        last_transaction_product > ISNULL(max_last_transaction_product, 0)        OR         last_update > ISNULL(max_last_update, 0)    )    AND    NOT EXISTS (        SELECT NULL         FROM Table_Destination td        WHERE td.product_name = ts.product_name        /* REMOVE BELOW LINE IF YOU NEED PRODUCTS IN DEST ONLY ONCE        KEEP THE LINE IF YOU WANT PRODUCT REPEATED ONCE FOR EACH SHOP */        AND td.shop_name = ts.shop_name     )​SET Table_Destination OFF;​

Advertisement

Answer