I have a problem that I need to insert a selected row from the table to another table based on last_update
data or last_transaction_product
. last_update
is the data updated from the table, and last_transaction_product
is the latest transaction from my shop, and then my row has duplicate data but has a different last_update
column here, this duplicate data must not be inserted to my destination table here, because of the data has already inserted on destination table.
For example, I have 3 tables here.
- Source Table AS
Table_Source
- Destination Table AS
Table_Destination
- Log Table AS
Table_Log
My problem is, my Source Table and Destination Table didn’t have any PK
that can be used for the identity of the Row table.
- So here are my
Table_Source
product_name | shop_name | last_transaction_product | last_update |
---|---|---|---|
Clock | Benefical | 2020-04-15 | 2020-06-01 |
Monitor | Hinezo Computer | 2020-01-17 | 2020-06-01 |
Notebook | LA Shop | 2020-03-25 | 2020-06-01 |
- I have a query that inserts it to
Table_Destination
based on> @last_transaction_product
or> @last_update
, here are myTable_Destination
after inserted fromTable_Source
, and I’m inserted it at 2020-06-02
product_name | shop_name | last_transaction_product | last_update |
---|---|---|---|
Clock | Benefical | 2020-04-15 | 2020-06-01 |
Monitor | Hinezo Computer | 2020-01-17 | 2020-06-01 |
Notebook | LA Shop | 2020-03-25 | 2020-06-01 |
- After it inserted into my
Table_Destination
, it will save the record from the latest activity table toTable_Log
, so here myTable_Log
after inserted.
table_name | last_transaction_product | last_update | last_run |
---|---|---|---|
Table_Source | 2020-04-15 | 2020-06-01 | 2020-06-02 |
- And the next time, there was new data inserted at
Table_Source
product_name | shop_name | last_transaction_product | last_update |
---|---|---|---|
Clock | Benefical | 2020-04-15 | 2020-06-01 |
Monitor | Hinezo Computer | 2020-01-17 | 2020-06-01 |
Notebook | LA Shop | 2020-03-25 | 2020-06-01 |
Poster | Maniac Shop | 2020-04-27 | 2020-06-03 |
Clock | Benefical | 2020-04-15 | 2020-06-03 |
- As you can see from the newest table of
Table_Source
there was a duplicated data('Clock', 'Benefical', '2020-04-15', '2020-06-03')
that has a different last_update. In my case, I want to excluded this duplicated data to insert into myTable_Destination
, so it will only insert the('Poster', 'Maniac Shop', '2020-04-27', '2020-06-03')
, and of course I used latestlast_transaction_product
andlast_update
fromTable_Log
as my parameter to inserting intoTable_Destination
. So myTable_Destination
and myTable_Log
will be like this.
product_name | shop_name | last_transaction_product | last_update |
---|---|---|---|
Clock | Benefical | 2020-04-15 | 2020-06-01 |
Monitor | Hinezo Computer | 2020-01-17 | 2020-06-01 |
Notebook | LA Shop | 2020-03-25 | 2020-06-01 |
Poster | Maniac Shop | 2020-04-27 | 2020-06-03 |
table_name | last_transaction_product | last_update | last_run |
---|---|---|---|
Table_Source | 2020-04-15 | 2020-06-01 | 2020-06-02 |
Table_Source | 2020-04-27 | 2020-06-03 | 2020-06-04 |
So here are my T-SQL now
BEGIN
-- INSERT DESTINATION DATA --
SET Table_Destination ON;
INSERT INTO
Table_Destination
SELECT
*
FROM
Table_Source
WHERE
last_transaction_product > (SELECT MAX(last_transaction_product) FROM Table_Log WHERE table_name = 'Table_Source')
OR
last_update > (SELECT MAX(last_update) FROM Table_Log WHERE table_name = 'Table_Source')
SET Table_Destination OFF;
-- INSERT LOG UPDATE --
INSERT INTO Table_Log(table_name, last_transaction_product, last_update, last_run)
VALUES('Table_Source', (SELECT MAX(last_transaction_product) FROM Table_Source WHERE table_name = 'Table_Source'), (SELECT MAX(last_update) FROM Table_Source WHERE table_name = 'Table_Source'), @Date)
END;
Note:
- This is the flow for inserting of my Data Data Inserting Flow
- I don’t want this logic using the
NOT IN
orNOT EXISTS
query, because in my case I have 10 million rows from theTable_Source
andTable_Destination
. Because it will make my database work too hard.
Advertisement
Answer
You should use NOT EXISTS
, it is your best option because of the semi-join.
If you have problems with millions rows you should check your indexes.
Table_Destination
hahas no duplicates for product_name
so indexing it (in both tables) is a good start and a join on it should not be terrific.
Also, there is something strange in your WHERE
clause..
I will try this
EDIT
Added a condition in NOT EXISTS
to get a product inserted in destination once per shop instead of only once.
Added also protection for 1st run when Table_Log
is empty.
SET Table_Destination ON;
INSERT INTO
Table_Destination
SELECT
*
FROM
Table_Source ts,
(
SELECT
MAX(last_transaction_product) max_last_transaction_product,
MAX(last_update) max_last_update
FROM Table_Log
WHERE table_name = 'Table_Source'
) agg
WHERE
(
last_transaction_product > ISNULL(max_last_transaction_product, 0)
OR
last_update > ISNULL(max_last_update, 0)
)
AND
NOT EXISTS (
SELECT NULL
FROM Table_Destination td
WHERE td.product_name = ts.product_name
/* REMOVE BELOW LINE IF YOU NEED PRODUCTS IN DEST ONLY ONCE
KEEP THE LINE IF YOU WANT PRODUCT REPEATED ONCE FOR EACH SHOP */
AND td.shop_name = ts.shop_name
)
SET Table_Destination OFF;