I have data like below. I would like to get the top choice for each gender from the following data
subjectID <- c("1", "2", "1", "0", "1", "0", "0", "1", "0", "2",
"0", "0", "2", "2","2","1","2","1","0","2")
gender <- c("M", "M", "F", "M", "M", "F", "M", "M", "M", "F",
"M", "F", "M", "M", "F","M", "F", "M", "F", "F")
selection <- data.frame(subjectID, gender)
subjectID <- c("1", "2", "0")
subject <- c("Maths", "Music", "English")
subjects <- data.frame(subjectID, subject)
I have tried mapping the choices by descending order as below :
favourite <- sqldf("SELECT a.gender, b.subject, COUNT(a.subjectID) as `no of selections`
FROM selection a
JOIN subjects b
ON (a.subjectID = b.subjectID )
GROUP BY a.subjectID, a.gender
ORDER BY a.gender, `no of selections` DESC
")
BUT, I would like to get the following table where i get the top choice for eachgender :
gender <- c("F", "M")
subjects <- c("Music", "Maths")
mostfav <- data.frame(gender, subjects)
Advertisement
Answer
If you are runing MySQL 8.0, you can use RANK()
in a subquery to rank the records by subject count for each gender, and filter on the top record per group in the outer query (if there are top ties, RANK()
preserves them):
SELECT gender, subject, no_of_selections
FROM (
SELECT
se.gender,
su.subject,
COUNT(*) as no_of_selections,
RANK() OVER(PARTITION BY se.gender ORDER BY COUNT(*) DESC) rn
FROM selection se
JOIN subjects su ON se.subjectID = su.subjectID
GROUP BY se.subjectID, se.gender, su.subject
) t
WHERE rn = 1
ORDER BY gender DESC
In earlier versions, where window functions are not availabe, one option is to filter with a having
clause that returns to top count per gender:
SELECT
se.gender,
su.subject,
COUNT(*) as no_of_selections
FROM selection se
JOIN subjects su ON se.subjectID = su.subjectID
GROUP BY se.subjectID, se.gender, su.subject
HAVING COUNT(*) = (
SELECT COUNT(*)
FROM selection se1
WHERE se1.gender = se.gender
GROUP BY se1.subjectID, se1.gender
ORDER BY COUNT(*) DESC
LIMIT 1
)
Notes:
I changed the table aliases to make them more meaningful
You should the
subject
column to theGROUP BY
clause to make your query runnable under sql modeONLY_FULL_GROUP_MODE
, which is by default enabled starting MySQL 5.7