I have a problem that I need to insert a selected row from the table to another table based on last_update data or last_transaction_product. last_update is the data updated from the table, and last_transaction_product is the latest transaction from my shop, and then my row has duplicate data but has a different last_update column here, this duplicate data must not be inserted to my destination table here, because of the data has already inserted on destination table.
For example, I have 3 tables here.
- Source Table AS
Table_Source - Destination Table AS
Table_Destination - Log Table AS
Table_Log
My problem is, my Source Table and Destination Table didn’t have any PK that can be used for the identity of the Row table.
- So here are my
Table_Source
| product_name | shop_name | last_transaction_product | last_update |
|---|---|---|---|
| Clock | Benefical | 2020-04-15 | 2020-06-01 |
| Monitor | Hinezo Computer | 2020-01-17 | 2020-06-01 |
| Notebook | LA Shop | 2020-03-25 | 2020-06-01 |
- I have a query that inserts it to
Table_Destinationbased on> @last_transaction_productor> @last_update, here are myTable_Destinationafter inserted fromTable_Source, and I’m inserted it at 2020-06-02
| product_name | shop_name | last_transaction_product | last_update |
|---|---|---|---|
| Clock | Benefical | 2020-04-15 | 2020-06-01 |
| Monitor | Hinezo Computer | 2020-01-17 | 2020-06-01 |
| Notebook | LA Shop | 2020-03-25 | 2020-06-01 |
- After it inserted into my
Table_Destination, it will save the record from the latest activity table toTable_Log, so here myTable_Logafter inserted.
| table_name | last_transaction_product | last_update | last_run |
|---|---|---|---|
| Table_Source | 2020-04-15 | 2020-06-01 | 2020-06-02 |
- And the next time, there was new data inserted at
Table_Source
| product_name | shop_name | last_transaction_product | last_update |
|---|---|---|---|
| Clock | Benefical | 2020-04-15 | 2020-06-01 |
| Monitor | Hinezo Computer | 2020-01-17 | 2020-06-01 |
| Notebook | LA Shop | 2020-03-25 | 2020-06-01 |
| Poster | Maniac Shop | 2020-04-27 | 2020-06-03 |
| Clock | Benefical | 2020-04-15 | 2020-06-03 |
- As you can see from the newest table of
Table_Sourcethere was a duplicated data('Clock', 'Benefical', '2020-04-15', '2020-06-03')that has a different last_update. In my case, I want to excluded this duplicated data to insert into myTable_Destination, so it will only insert the('Poster', 'Maniac Shop', '2020-04-27', '2020-06-03'), and of course I used latestlast_transaction_productandlast_updatefromTable_Logas my parameter to inserting intoTable_Destination. So myTable_Destinationand myTable_Logwill be like this.
| product_name | shop_name | last_transaction_product | last_update |
|---|---|---|---|
| Clock | Benefical | 2020-04-15 | 2020-06-01 |
| Monitor | Hinezo Computer | 2020-01-17 | 2020-06-01 |
| Notebook | LA Shop | 2020-03-25 | 2020-06-01 |
| Poster | Maniac Shop | 2020-04-27 | 2020-06-03 |
| table_name | last_transaction_product | last_update | last_run |
|---|---|---|---|
| Table_Source | 2020-04-15 | 2020-06-01 | 2020-06-02 |
| Table_Source | 2020-04-27 | 2020-06-03 | 2020-06-04 |
So here are my T-SQL now
BEGIN
-- INSERT DESTINATION DATA --
SET Table_Destination ON;
INSERT INTO
Table_Destination
SELECT
*
FROM
Table_Source
WHERE
last_transaction_product > (SELECT MAX(last_transaction_product) FROM Table_Log WHERE table_name = 'Table_Source')
OR
last_update > (SELECT MAX(last_update) FROM Table_Log WHERE table_name = 'Table_Source')
SET Table_Destination OFF;
-- INSERT LOG UPDATE --
INSERT INTO Table_Log(table_name, last_transaction_product, last_update, last_run)
VALUES('Table_Source', (SELECT MAX(last_transaction_product) FROM Table_Source WHERE table_name = 'Table_Source'), (SELECT MAX(last_update) FROM Table_Source WHERE table_name = 'Table_Source'), @Date)
END;
Note:
- This is the flow for inserting of my Data Data Inserting Flow
- I don’t want this logic using the
NOT INorNOT EXISTSquery, because in my case I have 10 million rows from theTable_SourceandTable_Destination. Because it will make my database work too hard.
Advertisement
Answer
You should use NOT EXISTS, it is your best option because of the semi-join.
If you have problems with millions rows you should check your indexes.
Table_Destination hahas no duplicates for product_name so indexing it (in both tables) is a good start and a join on it should not be terrific.
Also, there is something strange in your WHERE clause..
I will try this
EDIT
Added a condition in NOT EXISTS to get a product inserted in destination once per shop instead of only once.
Added also protection for 1st run when Table_Log is empty.
SET Table_Destination ON;
INSERT INTO
Table_Destination
SELECT
*
FROM
Table_Source ts,
(
SELECT
MAX(last_transaction_product) max_last_transaction_product,
MAX(last_update) max_last_update
FROM Table_Log
WHERE table_name = 'Table_Source'
) agg
WHERE
(
last_transaction_product > ISNULL(max_last_transaction_product, 0)
OR
last_update > ISNULL(max_last_update, 0)
)
AND
NOT EXISTS (
SELECT NULL
FROM Table_Destination td
WHERE td.product_name = ts.product_name
/* REMOVE BELOW LINE IF YOU NEED PRODUCTS IN DEST ONLY ONCE
KEEP THE LINE IF YOU WANT PRODUCT REPEATED ONCE FOR EACH SHOP */
AND td.shop_name = ts.shop_name
)
SET Table_Destination OFF;