BASED on the multi level MANY TO MANY dependency on student & subject cross reference, how to form a ORACLE sql/plsql query to GROUP the Student_Subject table so that groups are independent & exclusive as following? I have 519 rows in actual table. I am looking for a performant solution
Example Source table :
Expected Target output:
Actual Source Table Data creation query
CREATE TABLE student_subject ( student_name, subject_name ) AS SELECT 'SMITH', 'CHEMISTRY' FROM DUAL UNION ALL SELECT 'ROBIN', 'PHYSICS' FROM DUAL UNION ALL SELECT 'SAM', 'PHYSICS' FROM DUAL UNION ALL SELECT 'SAM', 'MATH' FROM DUAL UNION ALL SELECT 'JENNY', 'MATH' FROM DUAL UNION ALL SELECT 'JENNY', 'ACCOUNTS' FROM DUAL UNION ALL SELECT 'DON', 'ENGLISH' FROM DUAL UNION ALL SELECT 'DON', 'SPANISH' FROM DUAL UNION ALL SELECT 'RON', 'HISTORY' FROM DUAL UNION ALL SELECT 'JOVAN', 'HISTORY' FROM DUAL UNION ALL -- etc. for an additional 509 more rows
See this dbfiddle for a sample set of 475 rows.
Advertisement
Answer
I am looking for a performant solution
If you want performant, then forget about SQL. With your (highly connected) data, there’s just no way to pull a CONNECT BY
trick that isn’t going to look at a LOT of extra, unnecessary data.
Here’s a simple PL/SQL block that will do it very quickly. More quickly, by far, I believe, than is possible with pure SQL. (As usual, I’ll happily eat crow on that if it means learning something new…)
This works by finding the records with the lowest subject_name
from among the rows that have not yet been assigned to a group. These get the next group number assigned. Then, we loop through indefinitely looking for rows directly related to rows in the new group and assigning those to the new group. When we find no more directly related rows, that group is finished and we start the next group. When we fail to create any new groups, we’re finished.
-- Add a column to student_subject to hold the group number results ALTER TABLE student_subject ADD ( group_number NUMBER ); -- Let's time it... SET TIMING ON -- Go through and assign group numbers to each record. -- Doing it iteratively in PL/SQL means we can easily look at each row only once. BEGIN UPDATE student_subject SET group_number = null; LOOP UPDATE student_subject ss SET group_number = ( SELECT nvl(max(group_number),0)+1 FROM student_subject ss2 ) -- Assign next group number WHERE subject_name = ( SELECT min(subject_name) FROM student_subject ss2 WHERE group_number IS NULL ); EXIT WHEN SQL%ROWCOUNT = 0; -- done.. all rows have groups now. LOOP UPDATE student_subject ss SET group_number = ( SELECT max(group_number) FROM student_subject ss2 ) WHERE group_number is null AND EXISTS ( SELECT 'direct relation' FROM student_subject ss2 WHERE (ss2.subject_name = ss.subject_name OR ss2.student_name = ss.student_name ) AND ss2.group_number = ( SELECT max(group_number) FROM student_subject ss3 ) ); EXIT WHEN SQL%ROWCOUNT = 0; -- finished group. move to next group END LOOP; END LOOP; END;
PL/SQL procedure successfully completed. Elapsed: 00:00:00.128
You can optimize it further by replacing the (SELECT...
expressions in the update with PL/SQL variables. E.g., to keep track of the current group number instead of selecting MAX()
every time. I was just lazy.