I’m trying to eliminate certain records from a dataset using SQL Server. The title of my post may be inaccurate, as a better solution may exist than what I have in mind.
In my query, I am selecting from Table A, and the rows that I want to end up with should meet the following criteria:
- All rows where A.ItemNumber = B.ItemNumber
- All rows where A.ItemNumber <> B.ItemNumber AND that row’s Task value does not have another row that meets criteria #1.
So for the below example:
Gives us ItemNumber 102, 104, 106 rows.
Gives us ItemNumber 105 row. 100, 101 are removed from dataset because their Task (1) is associated with Table B at ItemNumber 102. Same for 103 with Task (2) being associated at ItemNumber 104.
Table A Task ItemNumber 1 100 1 101 1 102 2 103 2 104 3 105 4 106 Table B ItemNumber Data 102 aaa 104 bbb 106 ccc
My initial thought was to load Table A into a temp table, LEFT JOIN with Table B, and DELETE FROM {temp table} WHERE (data IS NULL AND {insert some kind of grouping logic here}). But I have been completely unable to figure out a grouping logic that will work for the problem. I spent the weekend hoping a solution would come to me, but am now giving in and seeking advice.
Advertisement
Answer
With a CTE that meets the 1st condition and UNION ALL to return the rest of the rows:
with cte as ( select a.* from TableA a where exists (select 1 from TableB where ItemNumber = a.ItemNumber) ) select * from cte union all select a.* from TableA a where not exists (select 1 from cte where Task = a.Task) order by Task
See the demo.
Results:
Task ItemNumber 1 102 2 104 3 105 4 106