I have data like below. I would like to get the top choice for each gender from the following data
subjectID <- c("1", "2", "1", "0", "1", "0", "0", "1", "0", "2", "0", "0", "2", "2","2","1","2","1","0","2") gender <- c("M", "M", "F", "M", "M", "F", "M", "M", "M", "F", "M", "F", "M", "M", "F","M", "F", "M", "F", "F") selection <- data.frame(subjectID, gender) subjectID <- c("1", "2", "0") subject <- c("Maths", "Music", "English") subjects <- data.frame(subjectID, subject)
I have tried mapping the choices by descending order as below :
favourite <- sqldf("SELECT a.gender, b.subject, COUNT(a.subjectID) as `no of selections` FROM selection a JOIN subjects b ON (a.subjectID = b.subjectID ) GROUP BY a.subjectID, a.gender ORDER BY a.gender, `no of selections` DESC ")
BUT, I would like to get the following table where i get the top choice for eachgender :
gender <- c("F", "M") subjects <- c("Music", "Maths") mostfav <- data.frame(gender, subjects)
Advertisement
Answer
If you are runing MySQL 8.0, you can use RANK()
in a subquery to rank the records by subject count for each gender, and filter on the top record per group in the outer query (if there are top ties, RANK()
preserves them):
SELECT gender, subject, no_of_selections FROM ( SELECT se.gender, su.subject, COUNT(*) as no_of_selections, RANK() OVER(PARTITION BY se.gender ORDER BY COUNT(*) DESC) rn FROM selection se JOIN subjects su ON se.subjectID = su.subjectID GROUP BY se.subjectID, se.gender, su.subject ) t WHERE rn = 1 ORDER BY gender DESC
In earlier versions, where window functions are not availabe, one option is to filter with a having
clause that returns to top count per gender:
SELECT se.gender, su.subject, COUNT(*) as no_of_selections FROM selection se JOIN subjects su ON se.subjectID = su.subjectID GROUP BY se.subjectID, se.gender, su.subject HAVING COUNT(*) = ( SELECT COUNT(*) FROM selection se1 WHERE se1.gender = se.gender GROUP BY se1.subjectID, se1.gender ORDER BY COUNT(*) DESC LIMIT 1 )
Notes:
I changed the table aliases to make them more meaningful
You should the
subject
column to theGROUP BY
clause to make your query runnable under sql modeONLY_FULL_GROUP_MODE
, which is by default enabled starting MySQL 5.7