I’m trying to eliminate certain records from a dataset using SQL Server. The title of my post may be inaccurate, as a better solution may exist than what I have in mind.
In my query, I am selecting from Table A, and the rows that I want to end up with should meet the following criteria:
- All rows where A.ItemNumber = B.ItemNumber
- All rows where A.ItemNumber <> B.ItemNumber AND that row’s Task value does not have another row that meets criteria #1.
So for the below example:
Gives us ItemNumber 102, 104, 106 rows.
Gives us ItemNumber 105 row. 100, 101 are removed from dataset because their Task (1) is associated with Table B at ItemNumber 102. Same for 103 with Task (2) being associated at ItemNumber 104.
Table A
Task ItemNumber
1 100
1 101
1 102
2 103
2 104
3 105
4 106
Table B
ItemNumber Data
102 aaa
104 bbb
106 ccc
My initial thought was to load Table A into a temp table, LEFT JOIN with Table B, and DELETE FROM {temp table} WHERE (data IS NULL AND {insert some kind of grouping logic here}). But I have been completely unable to figure out a grouping logic that will work for the problem. I spent the weekend hoping a solution would come to me, but am now giving in and seeking advice.
Advertisement
Answer
With a CTE that meets the 1st condition and UNION ALL to return the rest of the rows:
with cte as (
select a.*
from TableA a
where exists (select 1 from TableB where ItemNumber = a.ItemNumber)
)
select * from cte
union all
select a.* from TableA a
where not exists (select 1 from cte where Task = a.Task)
order by Task
See the demo.
Results:
Task ItemNumber
1 102
2 104
3 105
4 106