I am trying to write a SQL Server CASE statement to retrieve a student’s category based on the following condition.
If the category
column from the student_category
table has both values ‘X’ and ‘Y’ for a student_id
value from the student
table, then only display the record where value is ‘Y’.
If the category
has either values ‘X’ or ‘Y’, then display the records with that value.
If the category
does not have values ‘X’ or ‘Y’, display a blank for the column i.e. if there are any other categories except ‘X’ or ‘Y’, display a blank.
Thus, I should only have one row of data for each student_id
even if they have multiple categories. However, I am still getting multiple records for each student_id
with more than one category. Any ideas on what I might be missing ?
SELECT DISTINCT s.student_id, CASE WHEN sc.category = 'X' AND sc.category = 'Y' THEN 'Y' WHEN sc.category = 'X' OR sc.category = 'Y' THEN sc.category ELSE '' END AS student_cat FROM student s LEFT JOIN student_category sc ON s.student_id = sc.student_id
Advertisement
Answer
You are having two rows returned because, regardless of what conditions you put in that CASE statement of yours, for a certain student ID, there are two records from the join between student and student_category. Without some sort of aggregation, you will have two rows.
There are a few ways to solve this. This is how I would do it:
SELECT s.student_id, --Check how many of each category there are for the student CASE WHEN sc.Number_of_x = 1 AND sc.Number_of_y = 1 THEN 'Y' WHEN sc.Number_of_x = 1 AND sc.Number_of_y = 0 THEN 'X' WHEN sc.Number_of_x = 0 AND sc.Number_of_y = 1 THEN 'Y' ELSE '' END AS student_cat FROM student AS s LEFT JOIN ( --Aggregate the categories per student so that we only have 1 line --per student_id SELECT student_id, SUM( IIF( sc.category = 'X', 1, 0 ) ) AS Number_of_x, SUM( IIF( sc.category = 'Y', 1, 0 ) ) AS Number_of_y, FROM student_category AS sc GROUP BY student_id ) AS sc ON s.student_id = sc.student_id
This way you first aggregate the student_category table in a sub-query to return 1 row per student, then check those counts to determine the category.
This is assuming the only categories are X and Y, and there can only ever be 0 or 1 of either category.
I find it helps to just write SELECT * with a sample of the data before writing queries using DISTINCT or CASE statement to see what it looks like raw, and how many rows there are.