Skip to content
Advertisement

TSQL – Union records based on partial uniqueness

Sample Data

DECLARE @T1 TABLE (Id INT, Val1 VARCHAR(20), Val2 VARCHAR(20), [Group] INT);

INSERT INTO @T1
SELECT 1, 'One', '01', 1
UNION
SELECT 2, 'Two', '02', 1
UNION
SELECT 3, 'Three', '03', 1
UNION
SELECT 3, 'Threeeee', '03', 2
UNION
SELECT 4, 'Four', '04', 2
UNION
SELECT 5, 'Five', '05', 2
UNION
SELECT 4, 'Four', '04', 3
UNION
SELECT 5, 'Five', '050', 3
UNION
SELECT 6, 'Six', '06', 3;

Currently inserted data:

1   One         01  1
2   Two         02  1
3   Three       03  1
3   Threeeee    03  2
4   Four        04  2
5   Five        05  2
4   Four        04  3
5   Five        050 3
6   Six         06  3

Goal: To insert record into the table variable (or cte/temp table) based on uniqueness ONLY ON [Id] field. “UNION” would not do that because, at minimum, [Group] is going to make duplicate records based on just [Id]… unique. Oh, and [Group] value basically indicates the level of importance, 1 = these records should be inserted first and 3 = these records should be inserted last, after making sure there are no other records in the target table with same [Id] value.

Expected to insert data:

1   One         01  1
2   Two         02  1
3   Three       03  1
4   Four        04  2
5   Five        05  2
6   Six         06  3

I can do this by first inserting [Group] = 1 records into @T1. Then write another insert for [Group] = 2 where [Id] doesn’t exist, and so on. But am looking for an efficient way. I want to make the insert efficient.

Advertisement

Answer

First create a CTE with all the rows that you want to insert:

WITH cte(Id, Val1, Val2, [Group]) AS (
  SELECT 1, 'One', '01', 1
  UNION ALL
  SELECT 2, 'Two', '02', 1
  UNION ALL
  SELECT 3, 'Three', '03', 1
  UNION ALL
  SELECT 3, 'Threeeee', '03', 2
  UNION ALL
  SELECT 4, 'Four', '04', 2
  UNION ALL
  SELECT 5, 'Five', '05', 2
  UNION ALL
  SELECT 4, 'Four', '04', 3
  UNION ALL
  SELECT 5, 'Five', '050', 3
  UNION ALL
  SELECT 6, 'Six', '06', 3
)

and then in the INSERT statement you can use NOT EXISTS:

INSERT INTO @T1
SELECT c.* FROM cte c
WHERE NOT EXISTS (
  SELECT 1 FROM cte
  WHERE Id = c.ID AND [Group] < c.[Group]
);

See the demo.

Or ROW_NUMBER() window function:

INSERT INTO @T1
SELECT t.Id, t.Val1, t.Val2, t.[Group] 
FROM (
  SELECT *, ROW_NUMBER() OVER (PARTITION BY Id ORDER BY [GROUP]) rn
  FROM cte
) t
WHERE t.rn = 1;

See the demo.
Results:

> Id | Val1  | Val2 | Group
> -: | :---- | :--- | ----:
>  1 | One   | 01   |     1
>  2 | Two   | 02   |     1
>  3 | Three | 03   |     1
>  4 | Four  | 04   |     2
>  5 | Five  | 05   |     2
>  6 | Six   | 06   |     3
User contributions licensed under: CC BY-SA
5 People found this is helpful
Advertisement