I have been asked to build a query that runs quality checks for data on a certain state test.
I have the following query that creates a variable named ‘Flag’ to identify cases where there are issues. The main query is really in a subquery, and I filter out all records where there isn’t a flag by only selecting cases where Flag has a value.
I would like to be able to have a CASE WHEN for instances where there are duplicate tests. We do have some duplicates where, for the same student ID (test.studentID, studenttestscore.studentID, students.ID), a student will have the same test date for the same exam.
Test date, as you can see, is STUDENTTEST.TEST_DATE. Exams can be identified by either STUDENTTEST.ID or TEST.NAME. It might be easier to look for records where STUDENTTEST.test_date and STUDENTTEST.ID are the same for a STUDENTTEST.studentID. But once you get that count, it would have to be appended to the subquery here so that we can have a student name associated with it.
with query as (select STUDENTS.STUDENT_NUMBER as "Student ID", Students.Lastfirst as "Student Name",
STUDENTS.GRADE_LEVEL as "Current Grade Level",
TEST.NAME as "Test Name",
STUDENTTEST.GRADE_LEVEL as "Grade level at time of exam",
STUDENTTEST.TEST_DATE as TEST_DATE,
STUDENTTESTSCORE.NUMSCORE as Score, STUDENTTESTSCORE.PERCENTSCORE as PCTScore,
S_NY_STU_X.DATEOFENTRYGRade9 as Cohort,
CASE
WHEN STUDENTTEST.GRADE_LEVEL <9 or STUDENTTEST.GRADE_LEVEL>12 then 'Invalid Grade Level'
WHEN STUDENTTESTSCORE.NUMSCORE != STUDENTTESTSCORE.PERCENTSCORE then 'Mismatch between number and percent score'
WHEN TEST.NAME like '% - T%' then 'Transitional Global no longer administered'
WHEN STUDENTTEST.TEST_DATE>CURRENT_DATE THEN 'Test date invalid'
WHEN STUDENTTESTSCORE.NUMSCORE <0 THEN 'Test score out of range'
WHEN STUDENTTESTSCORE.NUMSCORE <0 THEN 'Test score out of range'
WHEN STUDENTTESTSCORE.NUMSCORE >100 THEN 'Test score out of range'
WHEN STUDENTTESTSCORE.PERCENTSCORE > 100 THEN 'Test score out of range'
END AS Flag
from STUDENTS STUDENTS,
STUDENTTEST STUDENTTEST,
STUDENTTESTSCORE STUDENTTESTSCORE,
TEST TEST,
TESTSCORE TESTSCORE,
S_NY_STU_X S_NY_STU_X
where
Test.Name like 'Regents%' and
STUDENTTESTSCORE.TESTSCOREID=TESTSCORE.ID
and STUDENTTESTSCORE.STUDENTTESTID=STUDENTTEST.ID
and STUDENTTESTSCORE.STUDENTID=STUDENTTEST.STUDENTID
and STUDENTS.ID=STUDENTTEST.STUDENTID
and STUDENTS.ID=STUDENTTESTSCORE.STUDENTID
and STUDENTTEST.TESTID=TEST.ID
and S_NY_STU_X.STUDENTSDCID=STUDENTS.DCID)
select * from query where flag is not null
Thank you so much!!
Advertisement
Answer
From what I understood you can add this analytical count()
to your case
:
when count(1) over (partition by studenttest.test_date,
studenttest.id,
studenttest.studentid) > 1
then 'duplicate'
Here is the example:
with t (test_date, test_id, student_id) as (
select date '2013-05-18', 707, 20214 from dual union all
select date '2013-05-18', 707, 20214 from dual union all
select date '2013-05-12', 707, 20214 from dual union all
select date '2023-05-18', 707, 1003 from dual union all
select date '2013-05-02', 705, 1003 from dual )
select t.*,
case
when test_date > current_date
then 'Test date invalid'
when count(1) over (partition by test_date, test_id, student_id) > 1
then 'duplicated test'
end as flag
from t
Result:
TEST_DATE TEST_ID STUDENT_ID FLAG
----------- ---------- ---------- -----------------
2013-05-02 705 1003
2013-05-12 707 20214
2013-05-18 707 20214 duplicated test
2013-05-18 707 20214 duplicated test
2023-05-18 707 1003 Test date invalid