I am trying to find pairs of users who liked the same pages and list the ones who have the most common page likes at the top.
For simplicity I am considering the following table schema
Likes (LikeID, UserID) LikeDetail (LikeDetailID, LikeID, PageID)
I am trying to find pairs of users with most number of common page likes ordered descending. E.g User1 and User2 have liked 3 pages in common.
I would to have the resulting set of the query to be
UserID1 UserID2 NoOfCommonLikes 2 3 10 4 3 8 1 5 4
I am guessing it would need aggregation, join and aliases however I needed to rename a table twice using AS which did not work for me.
Any tip would be appreciated in MySQL, or SQL Server.
Advertisement
Answer
In SQL Server and MySQL 8+, you can use a CTE which JOIN
s the Likes
and LikeDetail
table, and then self-JOIN that where PageID
is the same but UserID
is not, and then grouping on the two userID values:
WITH CTE AS (SELECT l.UserId, d.PageID FROM Likes l JOIN LikeDetail d ON d.LikeID = l.likeID) SELECT l1.UserId AS UserID1, l2.UserID AS UserID2, COUNT(*) AS NoOfCommonLikes FROM CTE l1 JOIN CTE l2 ON l2.PageID = l1.PageID AND l2.UserID < l1.UserID GROUP BY l1.UserID, l2.UserID ORDER BY COUNT(*) DESC
In versions of MySQL prior to 8.0, you need to repeat the CTE defintion twice in a JOIN
to achieve the same result:
SELECT l1.UserId AS UserID1, l2.UserID AS UserID2, COUNT(*) AS NoOfCommonLikes FROM (SELECT l.UserId, d.PageID FROM Likes l JOIN LikeDetail d ON d.LikeID = l.likeID) l1 JOIN (SELECT l.UserId, d.PageID FROM Likes l JOIN LikeDetail d ON d.LikeID = l.likeID) l2 ON l2.PageID = l1.PageID AND l2.UserID < l1.UserID GROUP BY l1.UserID, l2.UserID ORDER BY COUNT(*) DESC
Note that we use <
in the UserID comparison rather than !=
to avoid getting duplicate rows (e.g. for (UserID1, UserID2) = (1, 2)
and (UserID1, UserID2) = (2, 1)
.
I’ve made a small demo on dbfiddle which demonstrate the queries.