Skip to content
Advertisement

Extra null rows when using LEFT OUTER JOIN

I have three tables SpringCount, SpringObservation, and MonitoringPoint. SpringCount and SpringObservation particpate in a one to many relationship. Some counts have no bird observations, some counts do. I want to return a table with all the counts represented and nulls in the appropriate fields when they don’t have a matching observation. I understand when doing a left outer join this will produce duplicate rows and that is what I want, however, I am ending up with both the duplicate rows I want and duplicate rows full of nulls that I don’t think should be there. I’m not sure where in my query I’m going wrong.

MonitoringPoint Table
MonitoringPoint  |  StateID
       1A              DE
       2A              DE
       3A              DE
       4A              DE
       5B              GA
       67C             IL

SpringCount Table
MonitoringPoint | CountID   |  Temp  |  Clouds  |
       1A            1          70        50
       2A            2          60        30
       3A            3          40        20
       4A            4          80        10

Spring Observation Table
SpringObsvID  |   CountID   |   Species
     A               1           NOBO
     B               3           FISP
     C               3           FISP
     D               3           NOBO
     E               4           PRAW
     F               4           FISP

This is the code I am executing (just trust that all the GDB_TO_DATE field exists and is not null for all records even though it’s not in my example tables above)

SELECT 
      sc.CountID
      ,[Temperature]
      ,[CloudCover]
      ,so.Species
  FROM [CollectorSpatialData].[dbo].[SPRINGCOUNT] sc
  JOIN MONITORINGPOINT mp
  ON sc.MonitoringPointID = mp.MonitoringPointID
    and StateID = 'DE'
  LEFT JOIN SPRINGOBSERVATION so
  ON sc.CountID = so.CountID
    and year(sc.GDB_TO_DATE) = 9999
    and year(so.GDB_TO_DATE) = 9999
order by CountDate

This is the table I expect to get

Output Table
    CountID  |   Temperature  |  Clouds  |  Species 
       1            70             50        NOBO
       2            60             30        NULL
       3            40             20        FISP
       3            40             20        FISP
       3            40             20        NOBO
       4            80             10        PRAW
       4            80             10        FISP

This is the table I actually end up with. I’m not sure why a CountID list all it’s matches, but also produce a (seemingly) random amount of non-matches for that same CountID.

Output Table
CountID  |   Temperature  |  Clouds  |  Species 
   1            70             50        NULL 
   1            70             50        NULL 
   1            70             50        NULL 
   1            70             50        NOBO
   2            60             30        NULL
   2            60             30        NULL
   2            60             30        NULL
   2            60             30        NULL
   3            40             20        NULL
   3            40             20        NULL
   3            40             20        NULL
   3            40             20        FISP
   3            40             20        FISP
   3            40             20        NOBO
   4            80             10        NULL
   4            80             10        NULL
   4            80             10        NULL
   4            80             10        PRAW
   4            80             10        FISP

Is there something in my query that needs to be rearranged or am I mistaken in my thinking of how left outer joins work? I have used SELECT DISTINCT and that at least groups all those erroneous null value rows together which helps, but they still exist and I’m not sure why.

Advertisement

Answer

Thanks to @underscore_d as he helped me get the answer. I needed to look at the fields that weren’t part of my query to figure out what was happening. Turns out I needed to do a subquery because the GDB_TO_DATE fields in each table were acting in unexpected ways. What I needed was all the counts that end in a 9999 year to match with any of the observations that ended in a 9999 year. Anything that doesn’t end in a 9999 year is a historical/archived record, but they’re kept in the same table as current records (it’s a spatial database and the software that creates these make them this way, definitely not my choice). Due to the fact that archived and current records are located in the same table, just using the Count IDs to match the records isn’t enough because you’ll end up matching a current count to an archived observation that still has that ID. I needed to select all the current observations before attempting to join to the current counts, that way the archived observations wouldn’t be included in the join.

Here’s the final code:

SELECT 
      sc.CountID
      ,[Temperature]
      ,[CloudCover]
      ,sub.Species
  FROM [CollectorSpatialData].[dbo].[SPRINGCOUNT] sc
  JOIN MONITORINGPOINT mp
  ON sc.MonitoringPointID = mp.MonitoringPointID
    and StateID = 'DE'
  LEFT JOIN 
    (SELECT so.CountID, 
            so.Species
     FROM SPRINGOBSERVATION so
     WHERE year(so.GDB_TO_DATE) = 9999) AS sub
  ON sc.GlobalID = sub.CountID
  where year(sc.GDB_TO_DATE) = 9999
order by CountDate
User contributions licensed under: CC BY-SA
8 People found this is helpful
Advertisement