Skip to content
Advertisement

Query to list users who have answered similarly to questions as the specified user

I’m making a project in which users answer questions with yes/no choices and an option of must-match, e.g.:
Question 1) Stack Overflow is helpful? [Yes / No] [MustMatch]
Question 2) Hills are better than beaches? [Yes / No] [MustMatch]
etc.
Users can skip questions if they want.

I need MySQL query to calculate match percentage or match ratio between two specified users; the match ratio being number of same answers / number of common questions. (The MustMatch option should be ignored for this query.) Example:
User 3 has answered as: Q1) Yes, Q2) No, Q3) Skip, Q4) Yes
User 5 has answered as: Q1) Yes, Q2) Yes, Q3) No, Q4) Yes
User 9 has answered as: Q1) Yes, Q2) No, Q3) Yes, Q4) No
Suppose we want match ratio of user 3 and user 5. The match ratio will be same-answers / common-questions = 2/3 as they have answered same to 2 questions (Q1 and Q4) and because user 3 skipped a question so they have 3 questions in common.
If we want match ratio of user 5 and user 9, it’ll be 1/4.

I also need MySQL query to show top 20 matches in descending order of match ratio for a user. The MustMatch option should be considered for this query. This list of top 20 matches will contain userid, no. of common answers, no. of common questions and (match percentage or match ratio). For top 20 matches of example user 5: user 3 will come before user 9 in descending order of match ratio:

“MustMatch” explained:
Simply, the answer must match. If the answer to a question is important compatibility match to a user then they can mark their answer as MustMatch. If a user marks their answer as MustMatch and that particular answer doesn’t match with the other user then don’t show their match in top 20 list of either of the users.
For example, if user 5 marks their answer to the 4th question as MustMatch then in user 5’s top 20 list, user 9 won’t display because of mismatching answers to 4th question. Also then in user 9’s top 20 list, user 5 won’t display.
If user 5 marks their answer to the 3rd question as MustMatch then user 5 won’t match with user 3 as user 3 has skipped that question and user 5 won’t match with user 9 because of mismatching answers to that question.
Skipped question can’t be marked MustMatch, as you can see in below database design that it’s not possible.

This is my table for storing answers:

If a question is skipped then don’t have its record in this answers table.

I’ve a separate table for storing questions’ texts:

In calculating top 20 matches, apart from MustMatch, there’s another factor: “MinMatch”, in which users can set the minimum matching answers they want from their matches. If user 5 set their MinMatch as 3 then in top 20 list of any user, user 5 will match with only those users who have minimum 3 same answers as user 5. MinMatch is set by the user; it should be >=1 and <= total questions answered by the user.

Can you please tell how to make the two queries? (my two bolded sentences)

Here’s how the 2nd query may be: (you can skip this)

Here i’ve modified someone(Tom H)’s query i found on Stack Overflow for a similar question. But my this query doesn’t work (error: Unknown column 'common_answers' in 'field list') and doesn’t include the MustMatch and MinMatch factors.


More edit: (More explanation:)

How my first example is stored in the database:
answers table:

As discussed, match ratio of user 3 and user 5 = 2/3, match ratio of user 5 and user 9 = 1/4.
Let’s add a new user: User 4:
User 4 has answered as: Q1) Yes, Q2) No, Q3) No, Q4) Yes

Let’s start calculating top 20 but before that we need to know the users’ minMatch values:
users table:

Top 20 matches of user 5:
Result:

If user 5 marks their answer to 3rd question as must match, i.e. this record:

changes to this:

Then user 5’s top 20 list will be:

Because only user 4 matches with user 5’s answer to 3rd question.
Also then user 3’s top 20 list will be:

Now if user 9 increases their min match to 3, i.e. this record:

changes to this:

Then user 3’s top 20 list will be:

User 9 got removed from user 3’s top 20 list, because user 3 and user 9 have only 2 answers in common while user 9 requires minimum 3 answers to be in common.
Also then user 9’s top 20 list will be empty as no user has at least 3 matching answers with user 9.


Instead of storing answers as i’m storing now, i can store them as strings in the users table if that’s a better method. Adding a field answers to the users table:

Then answers can be stored as strings e.g. yYNns, where y=yes to 1st and 2nd questions in this string, n=no to 3rd and 4th questions, capitalizations to 2nd and 3rd questions (‘Y’ and ‘N’) means must match, s=skip the 5th question.

Or maybe some other better storage method. You can give me the 2 SQL queries according to whichever method you think is the best.

Advertisement

Answer

For the first part where you just want to get the ratio of equally answered questions to common answered questions, I’d use this query:

I am using a2.userid > a1.userid to get each user pair only once. (If I used a2.userid <> a1.userid, I’d get users 3 and 5 twice for instance, one time as 3/5 and one time as 5/3.)

If you want to look up two particular users, we add the appropriate WHERE clause and remove and a2.userid > a1.userid. We can also get rid of GROUP BY, because there is just one row to return.

For the top 20 query we have one user given and need a HAVING clause to make sure no mustmatch and minmatch is violated.

Demo: https://dbfiddle.uk/?rdbms=mysql_8.0&fiddle=21edc084a0843987a14cb8b49f0f0213

Both queries use sum( <boolean expresssion> ). As true is 1 and false is 0 in MySQL this counts matches. The standard SQL way of writing this would be sum(case when <boolean expresssion> then 1 else 0 end) or count(*) filter (where <boolean expresssion>).

User contributions licensed under: CC BY-SA
6 People found this is helpful
Advertisement