Skip to content
Advertisement

SQL Subquery to handle duplicates

So I am joining to tables, however the table I want to left join has duplicate values for the matching field. I am trying to create a subquery so that when there are duplicates the column value is assigned ‘Duplicate’ and when there isn’t then it takes the value of the join.

Here is simplified query of what I have tried so far:

SELECT
    l.Id,
    (
        SELECT
            CASE 
                WHEN count(d.db_id) > 1 THEN 'Duplicate'
                ELSE COALESCE(a.name, 'Unknown')
        END
        FROM
            Dynamics AS d
                LEFT JOIN account as a
                ON a.account_id = d.account_id
         WHERE d.db_id = l.Id
         GROUP BY  ld.db_id
    ) as account_name
FROM listings as l

However, obviously this doesn’t work because a.name is not in the group by clause. I don’t want to include it in the group by clause because often the duplicate records have different values for it.

Edit: also important to note that the duplicate rows have the same created date, but often have different account_id’s, which is why when I find a duplicate I just want to label it ‘Duplicate’ and not have to choose one

Sample Expected Result:

Listing Table: | Id | | — | | 1 | | 2 |

Dynamics Table: Id | db_id | account_id — | — | — 1 | 1 | 1 2 | 1 | Null 3 | 2 | 2

Account Table: account_id | name — | — 1 | account1 2 | account2

Expected resulting Table: Id | account_name — | — 1 | Duplicate 2 | account2

Advertisement

Answer

As you stated, this can be done with a subquery and joining on the subquery I’m using ROW_NUMBER(), OVER and PARTITION BY in the subquery to get the rows in order by the account_id and for every row that has the same account_id, add 1 to the count of that row’s column RowNum, then in the outer query, you can use the case to check if the subquery column RowNum is equal to 1, if so, use the value from table account else show ‘Duplicate’, to read more about the ROW_NUMBER() window function and the key words used in the subquery you can read this (https://www.sqlservertutorial.net/sql-server-window-functions/sql-server-row_number-function/). You can also use the ORDER BY in the subquery to order your dynamics by a date or any other column in the Dynamics table in case you want the 1st record of each partition to be the most recent or whatever your logic is there:

Edit: Since you stated you don’t want to Group By on the name for whatever reason, you can add a WHERE clause to this to only take the record if it’s RowNum is less than 3, this way you get at most 2 records for each account_id, one being the name and the other being duplicate, then you again need to use ROW_NUMBER(), OVER, and PARTITION BY to order the records from greatest to least and take only the top record of each partition ensuring you get the record with name duplicate in cases where there are more than one record with the given id, then join on the listings table. I really don’t understand your refusal to just Group By on the name as well, but this does what you asked for. I’d definitely be doing a performance check on this versus Group By including the name and see which is faster.

SELECT l.[Id], dx.[Name]
FROM listings AS l
INNER JOIN
(
    SELECT d.db_id, d.[Name], ROW_NUMBER() OVER (Partition BY d.db_id ORDER BY d.[RowNum] DESC) AS [RowNum]
    FROM
    (
        SELECT d.db_id, CASE WHEN d.[RowNum] > 1 THEN 'Duplicate' ELSE COALESCE(a.name, 'Unknown') END AS [Name], d.[RowNum]
        FROM
        (
            SELECT d.account_id, d.db_id, ROW_NUMBER() OVER (PARTITION BY d.db_id ORDER BY d.db_id) AS [RowNum]
            FROM Dynamics AS d
        ) AS d
        LEFT JOIN account as a
        ON a.account_id = d.account_id
        WHERE d.[RowNum] < 3
    ) AS d
) AS dx
ON dx.db_id = l.Id
WHERE dx.[RowNum] = 1

EDIT: Here’s a simpler version that I believe returns the results as you want:

SELECT a.[Id], a.[account_name]
FROM
(
    SELECT l.[Id], CASE WHEN dx.db_count > 1 THEN 'Duplicate' ELSE COALESCE(a.Name, 'Unknown') END AS [account_name]
    FROM @dynamic AS d
    INNER JOIN
    (
        SELECT d.db_id, COUNT(d.db_id) AS [db_count]
        FROM @dynamic AS d
        GROUP BY d.db_id
    ) AS dx
    ON dx.[db_id] = d.[db_id]
    LEFT JOIN @accounts AS a
    ON a.account_id = d.[account_id]
    INNER JOIN @listings AS l
    ON l.[Id] = d.[db_id]
) AS a
GROUP BY a.[Id], a.[account_name]
User contributions licensed under: CC BY-SA
9 People found this is helpful
Advertisement