Skip to content
Advertisement

ETL – Update Tables with potentially new records or updated records

I am trying to accomplish the following table update via an insert for an etl process:

  1. Add in a new records that did not exists
  2. Update any records that had updates
  3. Keep all the records in place that did not change

On the table I have a unique_id and an updated_date field when the record was updated ( or could serve as the date it was first created). Below is example code of two tables and a third I would want the output to be after an insert statement:

BEGIN TRANSACTION;

/* Create a table called NAMES */
CREATE TABLE NAMES(Id integer PRIMARY KEY, Name text, updated_date DATE);
CREATE TABLE NEW_NAMES(Id integer PRIMARY KEY, Name text, updated_date DATE);
CREATE TABLE FINAL_TABLE_DESIRED(Id integer PRIMARY KEY, Name text, updated_date DATE);
/* Create few records in this table */
INSERT INTO NAMES VALUES(1,'Tom', '2021-08-09');
INSERT INTO NAMES VALUES(2,'Lucy', '2021-08-07');
INSERT INTO NAMES VALUES(3,'Frank', '2021-08-07');
INSERT INTO NAMES VALUES(4,'Jane', '2021-08-07');
INSERT INTO NAMES VALUES(5,'Robert', '2021-08-07');


INSERT INTO NEW_NAMES VALUES(1,'Tom John', '2021-08-17');
INSERT INTO NEW_NAMES VALUES(70,'Lollah', '2021-08-07');


INSERT INTO FINAL_TABLE_DESIRED  VALUES(2,'Lucy', '2021-08-07');
INSERT INTO FINAL_TABLE_DESIRED  VALUES(3,'Frank', '2021-08-07');
INSERT INTO FINAL_TABLE_DESIRED  VALUES(4,'Jane', '2021-08-07');
INSERT INTO FINAL_TABLE_DESIRED  VALUES(5,'Robert', '2021-08-07');
INSERT INTO FINAL_TABLE_DESIRED  VALUES(1,'Tom John', '2021-08-17');
INSERT INTO FINAL_TABLE_DESIRED  VALUES(70,'Lollah', '2021-08-17');

COMMIT;

/* Display all the records from the table */
SELECT * FROM  final_table_desired;

is there a better approach than the above ie an scd2 – if so why?

Advertisement

Answer

You can use the MERGE statement to accomplish this

MERGE INTO NAMES
USING NEW_NAMES
ON NEW_NAMES.ID = NAMES.ID
WHEN MATCHED THEN UPDATE SET NAMES.NAME = NEW_NAMES.NAME, UPDATED_DATE = CURRENT_DATE()
WHEN NOT MATCHED THEN INSERT (ID, NAME, UPDATED_DATE) VALUES (NEW_NAMES.ID, NEW_NAMES.NAME, CURRENT_DATE());
User contributions licensed under: CC BY-SA
5 People found this is helpful
Advertisement