Skip to content
Advertisement

get first row fo each group SQL

I have data like below. I would like to get the top choice for each gender from the following data

subjectID <- c("1", "2", "1", "0", "1", "0", "0", "1", "0", "2", 
       "0", "0", "2", "2","2","1","2","1","0","2")
gender <- c("M", "M", "F", "M", "M", "F", "M", "M", "M", "F", 
    "M", "F", "M", "M", "F","M", "F", "M", "F",  "F")

 selection <- data.frame(subjectID, gender)
       subjectID <- c("1", "2", "0")
       subject <- c("Maths", "Music",  "English")

    subjects <- data.frame(subjectID, subject)

I have tried mapping the choices by descending order as below :

 favourite <- sqldf("SELECT  a.gender, b.subject, COUNT(a.subjectID) as `no of selections`
              FROM selection a
              JOIN subjects b
              ON (a.subjectID  = b.subjectID )
              GROUP BY a.subjectID, a.gender
              ORDER BY a.gender, `no of selections` DESC
              ")

BUT, I would like to get the following table where i get the top choice for eachgender :

gender <- c("F", "M")
subjects <- c("Music", "Maths")
mostfav <- data.frame(gender, subjects)

Advertisement

Answer

If you are runing MySQL 8.0, you can use RANK() in a subquery to rank the records by subject count for each gender, and filter on the top record per group in the outer query (if there are top ties, RANK() preserves them):

SELECT gender, subject, no_of_selections
FROM (
    SELECT  
        se.gender, 
        su.subject, 
        COUNT(*) as no_of_selections,
        RANK() OVER(PARTITION BY se.gender ORDER BY COUNT(*) DESC) rn
    FROM selection se
    JOIN subjects su ON se.subjectID  = su.subjectID
    GROUP BY se.subjectID, se.gender, su.subject
) t
WHERE rn = 1
ORDER BY gender DESC

In earlier versions, where window functions are not availabe, one option is to filter with a having clause that returns to top count per gender:

SELECT  
    se.gender, 
    su.subject, 
    COUNT(*) as no_of_selections
FROM selection se
JOIN subjects su ON se.subjectID  = su.subjectID
GROUP BY se.subjectID, se.gender, su.subject
HAVING COUNT(*) = (
    SELECT COUNT(*) 
    FROM selection se1
    WHERE se1.gender = se.gender
    GROUP BY se1.subjectID, se1.gender
    ORDER BY COUNT(*) DESC
    LIMIT 1
)

Notes:

  • I changed the table aliases to make them more meaningful

  • You should the subject column to the GROUP BY clause to make your query runnable under sql mode ONLY_FULL_GROUP_MODE, which is by default enabled starting MySQL 5.7

User contributions licensed under: CC BY-SA
9 People found this is helpful
Advertisement