Skip to content
Advertisement

Using Subqueries to remove duplicate IDs [closed]

I have 2 Tables. Table 1 holds ID1 and ID2. Table 2 holds ID2 and ID3.

Table 1 has unique cases for ID1 and multiple cases for ID2.

TABLE 1:

ID1 | ID2
1     1
2     2 
3     3
4     3
5     4
6     5
7     5
8     6 
9     7
10    6

Table 2 has unique cases for ID2 and multiple cases for ID3

TABLE 2:

ID2 | ID3
1     1
2     1 
3     2
4     3
5     2
6     4
7     5

I want 1 unique case of ID3.

I need remove duplicate ID2s from Table 1 picking to remove the duplicate ID2s based on the smaller ID1

So Table 1 now looks like:

TABLE 1:

ID1 | ID2
1     1
2     2 
4     3
5     4
7     5
9     7
10    6

Now I want to go to Table 2 and remove any duplicate ID3s based on the smaller ID2

TABLE 2:

ID2 | ID3
2     1 
4     3
5     2
6     4
7     5

So my end result should be (I am joining the tables because both of them have other relevant information I need to combine but these are the IDs I am sorting and filtering to get the correct row):

Final Table:
ID1 | ID2 | ID3
2     2     1
7     5     2
5     4     3
10    6     4
9     7     5

Where now I have a single case for each ID3 based on the largest ID1 and ID2 associated with that ID3.

I have tried creating subqueries in the WHERE function to remove the duplicates but my understanding of SQL is not good enough to really figure out what is happening.

Group By and DISTINCT does not work for this case.

Decision Tree

I added a Decision Tree to help visualize the problem. Essentially, each ID3 can potentially have multiple ID2s, which can potentially have multiple ID1s. I want to keep only the largest ID1, which gives me the correct ID2 associated with that ID3.

Advertisement

Answer

with t1 as  (
select  ID1, ID2
from
(
select  *
       ,row_number() over(partition by ID2 order by ID1 desc) as rn
from    t 
) t
where rn = 1
            ),
t3 as       (
select  ID2, ID3
from
(
select  *
       ,row_number() over(partition by ID3 order by ID2 desc) as rn
from    t2
) t
where   rn = 1
            )

select   t1.ID1
        ,t1.ID2
        ,t3.ID3
from     t1 join t3 on t3.ID2 = t1.ID2
order by ID3
ID1 ID2 ID3
2 2 1
7 5 2
5 4 3
10 6 4
9 7 5

Fiddle

User contributions licensed under: CC BY-SA
9 People found this is helpful
Advertisement