I have a problem that I need to insert a selected row from the table to another table based on last_update
data or last_transaction_product
. last_update
is the data updated from the table, and last_transaction_product
is the latest transaction from my shop, and then my row has duplicate data but has a different last_update
column here, this duplicate data must not be inserted to my destination table here, because of the data has already inserted on destination table.
For example, I have 3 tables here.
- Source Table AS
Table_Source
- Destination Table AS
Table_Destination
- Log Table AS
Table_Log
My problem is, my Source Table and Destination Table didn’t have any PK
that can be used for the identity of the Row table.
- So here are my
Table_Source
product_name | shop_name | last_transaction_product | last_update |
---|---|---|---|
Clock | Benefical | 2020-04-15 | 2020-06-01 |
Monitor | Hinezo Computer | 2020-01-17 | 2020-06-01 |
Notebook | LA Shop | 2020-03-25 | 2020-06-01 |
- I have a query that inserts it to
Table_Destination
based on> @last_transaction_product
or> @last_update
, here are myTable_Destination
after inserted fromTable_Source
, and I’m inserted it at 2020-06-02
product_name | shop_name | last_transaction_product | last_update |
---|---|---|---|
Clock | Benefical | 2020-04-15 | 2020-06-01 |
Monitor | Hinezo Computer | 2020-01-17 | 2020-06-01 |
Notebook | LA Shop | 2020-03-25 | 2020-06-01 |
- After it inserted into my
Table_Destination
, it will save the record from the latest activity table toTable_Log
, so here myTable_Log
after inserted.
table_name | last_transaction_product | last_update | last_run |
---|---|---|---|
Table_Source | 2020-04-15 | 2020-06-01 | 2020-06-02 |
- And the next time, there was new data inserted at
Table_Source
product_name | shop_name | last_transaction_product | last_update |
---|---|---|---|
Clock | Benefical | 2020-04-15 | 2020-06-01 |
Monitor | Hinezo Computer | 2020-01-17 | 2020-06-01 |
Notebook | LA Shop | 2020-03-25 | 2020-06-01 |
Poster | Maniac Shop | 2020-04-27 | 2020-06-03 |
Clock | Benefical | 2020-04-15 | 2020-06-03 |
- As you can see from the newest table of
Table_Source
there was a duplicated data('Clock', 'Benefical', '2020-04-15', '2020-06-03')
that has a different last_update. In my case, I want to excluded this duplicated data to insert into myTable_Destination
, so it will only insert the('Poster', 'Maniac Shop', '2020-04-27', '2020-06-03')
, and of course I used latestlast_transaction_product
andlast_update
fromTable_Log
as my parameter to inserting intoTable_Destination
. So myTable_Destination
and myTable_Log
will be like this.
product_name | shop_name | last_transaction_product | last_update |
---|---|---|---|
Clock | Benefical | 2020-04-15 | 2020-06-01 |
Monitor | Hinezo Computer | 2020-01-17 | 2020-06-01 |
Notebook | LA Shop | 2020-03-25 | 2020-06-01 |
Poster | Maniac Shop | 2020-04-27 | 2020-06-03 |
table_name | last_transaction_product | last_update | last_run |
---|---|---|---|
Table_Source | 2020-04-15 | 2020-06-01 | 2020-06-02 |
Table_Source | 2020-04-27 | 2020-06-03 | 2020-06-04 |
So here are my T-SQL now
BEGIN -- INSERT DESTINATION DATA -- SET Table_Destination ON; INSERT INTO Table_Destination SELECT * FROM Table_Source WHERE last_transaction_product > (SELECT MAX(last_transaction_product) FROM Table_Log WHERE table_name = 'Table_Source') OR last_update > (SELECT MAX(last_update) FROM Table_Log WHERE table_name = 'Table_Source') SET Table_Destination OFF; -- INSERT LOG UPDATE -- INSERT INTO Table_Log(table_name, last_transaction_product, last_update, last_run) VALUES('Table_Source', (SELECT MAX(last_transaction_product) FROM Table_Source WHERE table_name = 'Table_Source'), (SELECT MAX(last_update) FROM Table_Source WHERE table_name = 'Table_Source'), @Date) END;
Note:
- This is the flow for inserting of my Data Data Inserting Flow
- I don’t want this logic using the
NOT IN
orNOT EXISTS
query, because in my case I have 10 million rows from theTable_Source
andTable_Destination
. Because it will make my database work too hard.
Advertisement
Answer
You should use NOT EXISTS
, it is your best option because of the semi-join.
If you have problems with millions rows you should check your indexes.
Table_Destination
hahas no duplicates for product_name
so indexing it (in both tables) is a good start and a join on it should not be terrific.
Also, there is something strange in your WHERE
clause..
I will try this
EDIT
Added a condition in NOT EXISTS
to get a product inserted in destination once per shop instead of only once.
Added also protection for 1st run when Table_Log
is empty.
SET Table_Destination ON; INSERT INTO Table_Destination SELECT * FROM Table_Source ts, ( SELECT MAX(last_transaction_product) max_last_transaction_product, MAX(last_update) max_last_update FROM Table_Log WHERE table_name = 'Table_Source' ) agg WHERE ( last_transaction_product > ISNULL(max_last_transaction_product, 0) OR last_update > ISNULL(max_last_update, 0) ) AND NOT EXISTS ( SELECT NULL FROM Table_Destination td WHERE td.product_name = ts.product_name /* REMOVE BELOW LINE IF YOU NEED PRODUCTS IN DEST ONLY ONCE KEEP THE LINE IF YOU WANT PRODUCT REPEATED ONCE FOR EACH SHOP */ AND td.shop_name = ts.shop_name ) SET Table_Destination OFF;