Skip to content
Advertisement

How to select DISTINCT records based on multiple columns and without considering their order

I’m currently working on a SQL Server database and I would need a query that returns pairs of customers with the same city from a table that has this structure

Customer(ID, Name, Surname, City)

and this sample data

Name       | Surname   | City
-----------+-----------+-----------
Foo        | Foo       | New York
-----------+-----------+-----------
Bar        | Bar       | New York
-----------+-----------+-----------
Alice      | A         | London
-----------+-----------+-----------
Bob        | B         | London

I have tried defining a query that joins the Customer table itself

SELECT C1.Name + ' ' + C1.Surname CustomerA, C2.Name + ' ' + C2.Surname CustomerB, C1.City
FROM Customer C1 JOIN Customer C2 
     ON C1.City = C2.City
WHERE CustomerA <> CustomerB

but it gives me a table that looks like this

CustomerA  | CustomerB | City
-----------+-----------+-----------
Foo Foo    | Bar Bar   | New York
-----------+-----------+-----------
Bar Bar    | Foo Foo   | New York
-----------+-----------+-----------
Alice A    | Bob B     | London
-----------+-----------+-----------
Bob B      | Alice A   | London

with duplicated rows but with swapped customers.

My question is how would I have to do to select those rows once (e.g. for the first two results, it would be great only the first or the second row).

This would be an example of the expected result

CustomerA  | CustomerB | City
-----------+-----------+-----------
Foo Foo    | Bar Bar   | New York
-----------+-----------+-----------
Alice A    | Bob B     | London

Advertisement

Answer

I think I understand what you are looking for but it seems over simplified to your actual problem. Your query you posted was incredibly close to working. You can’t reference columns by their alias in the where predicates so you will need to use the string concatenation you had in your column. Then you can simply change the <> to either > or < so you only get one match. This example should work for your problem as I understand it.

declare @Customer table
(
    CustID int identity
    , Name varchar(10)
    , Surname varchar(10)
    , City varchar(10)
)

insert @Customer
select 'Foo', 'Foo', 'New York' union all
select 'Bar', 'Bar', 'New York' union all
select 'Smith', 'Smith', 'New York' union all
select 'Alice', 'A', 'London' union all
select 'Bob', 'B', 'London'

SELECT CustomerA = C1.Name + ' ' + C1.Surname
    , CustomerB = C2.Name + ' ' + C2.Surname
    , C1.City
FROM @Customer C1 
JOIN @Customer C2 ON C1.City = C2.City
where C1.Name + ' ' + C1.Surname > C2.Name + ' ' + C2.Surname
User contributions licensed under: CC BY-SA
1 People found this is helpful
Advertisement