I’m trying to find the query in order to match similar customers.
To simplify the situation consider this scenario:
I have a table which contains a customer name and product purchased. customer name can have multiple purchases of same and different products. So firstly I can take distinct customer name and product name, so I see all customers and all products they purchased at least once.
Now I want a query to show me a sort of matching customers, according to the product they both purchased, so I want to count the similar products they purchased.
So I want to see for each pair of customers (pairing all the table) the amount of similar product they purchased.
Lets say the raw data is:
CustomerName | ProductName A | 1 A | 2 A | 1 A | 3 B | 1 B | 2 B | 4 C | 2
Then I want to see the result of:
CustomerName1 | CustomerName2 | CountSimilarity A | B | 2 A | C | 1 B | C | 1
And so on for all pairs of customers that have at least 1 similar product purchasing
Any suggestions how to approach this query?
The environment is SQL Server.
Thanks
Advertisement
Answer
Here is a self join approach:
SELECT t1.CustomerName, t2.CustomerName, COUNT(*) AS CountSimilarity FROM yourTable t1 INNER JOIN yourTable t2 ON t1.ProductName = t2.ProductName WHERE t1.CustomerName < t2.CustomerName GROUP BY t1.CustomerName, t2.CustomerName;
Two records are joined together above if their products match. Note that the inequality in the WHERE
clause ensures that customer pairs do not appear in duplicate.