I’m trying to find the query in order to match similar customers.
To simplify the situation consider this scenario:
I have a table which contains a customer name and product purchased. customer name can have multiple purchases of same and different products. So firstly I can take distinct customer name and product name, so I see all customers and all products they purchased at least once.
Now I want a query to show me a sort of matching customers, according to the product they both purchased, so I want to count the similar products they purchased.
So I want to see for each pair of customers (pairing all the table) the amount of similar product they purchased.
Lets say the raw data is:
CustomerName | ProductName
A | 1
A | 2
A | 1
A | 3
B | 1
B | 2
B | 4
C | 2
Then I want to see the result of:
CustomerName1 | CustomerName2 | CountSimilarity
A | B | 2
A | C | 1
B | C | 1
And so on for all pairs of customers that have at least 1 similar product purchasing
Any suggestions how to approach this query?
The environment is SQL Server.
Thanks
Advertisement
Answer
Here is a self join approach:
SELECT t1.CustomerName, t2.CustomerName, COUNT(*) AS CountSimilarity
FROM yourTable t1
INNER JOIN yourTable t2
ON t1.ProductName = t2.ProductName
WHERE
t1.CustomerName < t2.CustomerName
GROUP BY
t1.CustomerName, t2.CustomerName;
Two records are joined together above if their products match. Note that the inequality in the WHERE
clause ensures that customer pairs do not appear in duplicate.