I have three tables SpringCount
, SpringObservation
, and MonitoringPoint
.
SpringCount
and SpringObservation
particpate in a one to many relationship. Some counts have no bird observations, some counts do. I want to return a table with all the counts represented and nulls in the appropriate fields when they don’t have a matching observation. I understand when doing a left outer join this will produce duplicate rows and that is what I want, however, I am ending up with both the duplicate rows I want and duplicate rows full of nulls that I don’t think should be there. I’m not sure where in my query I’m going wrong.
MonitoringPoint Table MonitoringPoint | StateID 1A DE 2A DE 3A DE 4A DE 5B GA 67C IL SpringCount Table MonitoringPoint | CountID | Temp | Clouds | 1A 1 70 50 2A 2 60 30 3A 3 40 20 4A 4 80 10 Spring Observation Table SpringObsvID | CountID | Species A 1 NOBO B 3 FISP C 3 FISP D 3 NOBO E 4 PRAW F 4 FISP
This is the code I am executing (just trust that all the GDB_TO_DATE field exists and is not null for all records even though it’s not in my example tables above)
SELECT sc.CountID ,[Temperature] ,[CloudCover] ,so.Species FROM [CollectorSpatialData].[dbo].[SPRINGCOUNT] sc JOIN MONITORINGPOINT mp ON sc.MonitoringPointID = mp.MonitoringPointID and StateID = 'DE' LEFT JOIN SPRINGOBSERVATION so ON sc.CountID = so.CountID and year(sc.GDB_TO_DATE) = 9999 and year(so.GDB_TO_DATE) = 9999 order by CountDate
This is the table I expect to get
Output Table CountID | Temperature | Clouds | Species 1 70 50 NOBO 2 60 30 NULL 3 40 20 FISP 3 40 20 FISP 3 40 20 NOBO 4 80 10 PRAW 4 80 10 FISP
This is the table I actually end up with. I’m not sure why a CountID
list all it’s matches, but also produce a (seemingly) random amount of non-matches for that same CountID
.
Output Table CountID | Temperature | Clouds | Species 1 70 50 NULL 1 70 50 NULL 1 70 50 NULL 1 70 50 NOBO 2 60 30 NULL 2 60 30 NULL 2 60 30 NULL 2 60 30 NULL 3 40 20 NULL 3 40 20 NULL 3 40 20 NULL 3 40 20 FISP 3 40 20 FISP 3 40 20 NOBO 4 80 10 NULL 4 80 10 NULL 4 80 10 NULL 4 80 10 PRAW 4 80 10 FISP
Is there something in my query that needs to be rearranged or am I mistaken in my thinking of how left outer joins work? I have used SELECT DISTINCT
and that at least groups all those erroneous null value rows together which helps, but they still exist and I’m not sure why.
Advertisement
Answer
Thanks to @underscore_d as he helped me get the answer. I needed to look at the fields that weren’t part of my query to figure out what was happening. Turns out I needed to do a subquery because the GDB_TO_DATE
fields in each table were acting in unexpected ways. What I needed was all the counts that end in a 9999 year to match with any of the observations that ended in a 9999 year. Anything that doesn’t end in a 9999 year is a historical/archived record, but they’re kept in the same table as current records (it’s a spatial database and the software that creates these make them this way, definitely not my choice). Due to the fact that archived and current records are located in the same table, just using the Count IDs to match the records isn’t enough because you’ll end up matching a current count to an archived observation that still has that ID. I needed to select all the current observations before attempting to join to the current counts, that way the archived observations wouldn’t be included in the join.
Here’s the final code:
SELECT sc.CountID ,[Temperature] ,[CloudCover] ,sub.Species FROM [CollectorSpatialData].[dbo].[SPRINGCOUNT] sc JOIN MONITORINGPOINT mp ON sc.MonitoringPointID = mp.MonitoringPointID and StateID = 'DE' LEFT JOIN (SELECT so.CountID, so.Species FROM SPRINGOBSERVATION so WHERE year(so.GDB_TO_DATE) = 9999) AS sub ON sc.GlobalID = sub.CountID where year(sc.GDB_TO_DATE) = 9999 order by CountDate