So I am joining to tables, however the table I want to left join has duplicate values for the matching field. I am trying to create a subquery so that when there are duplicates the column value is assigned ‘Duplicate’ and when there isn’t then it takes the value of the join.
Here is simplified query of what I have tried so far:
SELECT l.Id, ( SELECT CASE WHEN count(d.db_id) > 1 THEN 'Duplicate' ELSE COALESCE(a.name, 'Unknown') END FROM Dynamics AS d LEFT JOIN account as a ON a.account_id = d.account_id WHERE d.db_id = l.Id GROUP BY ld.db_id ) as account_name FROM listings as l
However, obviously this doesn’t work because a.name is not in the group by clause. I don’t want to include it in the group by clause because often the duplicate records have different values for it.
Edit: also important to note that the duplicate rows have the same created date, but often have different account_id’s, which is why when I find a duplicate I just want to label it ‘Duplicate’ and not have to choose one
Sample Expected Result:
Listing Table: | Id | | — | | 1 | | 2 |
Dynamics Table: Id | db_id | account_id — | — | — 1 | 1 | 1 2 | 1 | Null 3 | 2 | 2
Account Table: account_id | name — | — 1 | account1 2 | account2
Expected resulting Table: Id | account_name — | — 1 | Duplicate 2 | account2
Advertisement
Answer
As you stated, this can be done with a subquery and joining on the subquery
I’m using ROW_NUMBER()
, OVER
and PARTITION BY
in the subquery to get the rows in order by the account_id
and for every row that has the same account_id
, add 1 to the count of that row’s column RowNum
, then in the outer query, you can use the case to check if the subquery column RowNum
is equal to 1, if so, use the value from table account
else show ‘Duplicate’, to read more about the ROW_NUMBER()
window function and the key words used in the subquery you can read this (https://www.sqlservertutorial.net/sql-server-window-functions/sql-server-row_number-function/). You can also use the ORDER BY
in the subquery to order your dynamics by a date or any other column in the Dynamics
table in case you want the 1st record of each partition to be the most recent or whatever your logic is there:
Edit: Since you stated you don’t want to Group By on the name for whatever reason, you can add a WHERE
clause to this to only take the record if it’s RowNum
is less than 3, this way you get at most 2 records for each account_id, one being the name and the other being duplicate, then you again need to use ROW_NUMBER()
, OVER
, and PARTITION BY
to order the records from greatest to least and take only the top record of each partition ensuring you get the record with name duplicate in cases where there are more than one record with the given id, then join on the listings table. I really don’t understand your refusal to just Group By on the name as well, but this does what you asked for. I’d definitely be doing a performance check on this versus Group By including the name and see which is faster.
SELECT l.[Id], dx.[Name] FROM listings AS l INNER JOIN ( SELECT d.db_id, d.[Name], ROW_NUMBER() OVER (Partition BY d.db_id ORDER BY d.[RowNum] DESC) AS [RowNum] FROM ( SELECT d.db_id, CASE WHEN d.[RowNum] > 1 THEN 'Duplicate' ELSE COALESCE(a.name, 'Unknown') END AS [Name], d.[RowNum] FROM ( SELECT d.account_id, d.db_id, ROW_NUMBER() OVER (PARTITION BY d.db_id ORDER BY d.db_id) AS [RowNum] FROM Dynamics AS d ) AS d LEFT JOIN account as a ON a.account_id = d.account_id WHERE d.[RowNum] < 3 ) AS d ) AS dx ON dx.db_id = l.Id WHERE dx.[RowNum] = 1
EDIT: Here’s a simpler version that I believe returns the results as you want:
SELECT a.[Id], a.[account_name] FROM ( SELECT l.[Id], CASE WHEN dx.db_count > 1 THEN 'Duplicate' ELSE COALESCE(a.Name, 'Unknown') END AS [account_name] FROM @dynamic AS d INNER JOIN ( SELECT d.db_id, COUNT(d.db_id) AS [db_count] FROM @dynamic AS d GROUP BY d.db_id ) AS dx ON dx.[db_id] = d.[db_id] LEFT JOIN @accounts AS a ON a.account_id = d.[account_id] INNER JOIN @listings AS l ON l.[Id] = d.[db_id] ) AS a GROUP BY a.[Id], a.[account_name]