Skip to content
Advertisement

Joining two tables but pulling in values only if multiple columns match

I have the query below that assigns a station volume based on the volume processed that week. Station sizes are based by looking at 4 data points throughout the year. This is because station fluctuate in volume throughout the year for example, they may have small volume in one quarter, but large in another. The goal is to assign a size to a station that will best suite it the entire year. The sql below gets together the four data points that will be needed to evaluate what size a station_size will be applied. There are multiple combinations of volume possibilities that can take place over the year which is why they are stored in a separate table. I have a separate table called “station_sizes” that provides the logic for what the station size will be based on the volume taken from the four point in time.

For example if a station processes small volume in wk 14 and small volume in wk27 and standard volume in wk 40 and small volume in wk 47 then they would be a ds small station size. There are 1000s of scenarios in the station_size table.

I want the final output to be the excerpt from the query output + station_size. The station size will be pulled from the station_size table.

I need to know how to have the query below reference the station size table so that it can identify for each year group is wk_16= xyz.. and wk_26 =xyz.. and wk_39 = xyz.. and wk_43 =xyz then it will pull in the size logic form the station size table.

For the ar section in the query. I want the logic to be if a station has “Yes” for AR at a minimum it needs to be a standard station size. I am not sure if I did that peice right. I am fine if its bigger than that but no smaller.

SELECT station,   
  del_date,
  sum(volume)/7 as volume_ref,DATE_PART("week",del_date) as week_num,DATE_PART("year",del_date) as year, 
  CASE 
    when sum(volume)/7 between 0 and 20000 then 'ds x-small' 
    when sum(volume)/7 between 20000 and 36000 and max(ar) <> 'yes' then 'ds small' 
    when sum(volume)/7 between 36000 and 42000 then 'ds standard' 
    when sum(volume)/7 between 42000 and 72000 then 'ds large' 
    when sum(volume)/7 > 72000 then 'ds x-large' 
    when max(ar) = 'YES' then 'ds standard' 
    else 'ds small' 
  end as station_ref
FROM prophecy_na.na_topology_lrp 
  Right JOIN wbr_global.raw_station_extended_attribute 
    ON prophecy_na.na_topology_lrp.station = wbr_global.raw_station_extended_attribute.ds 
where week_num in(16,26,39,43)
GROUP BY station, del_date;

[![enter image description here][1]][1] [![enter image description here][2]][2]

Advertisement

Answer

It appears you are short on some clarification and I’ll get into that in a moment. You may want to consider creating an additional lookup table for your query such as StationSize.

StationSizeID
Description
MinVolume
ButLessThan

Then you could adjust your ranges without adjusting queries (in case there are others that use the same threshold values. You could have values such as

StationSizeID   Description   MinVolume  ButLessThan   
1               ds x-small    0          20,000 
2               ds small      20,000     36,000        
3               ds standard   36,000     42,000      
4               ds large      42,000     72,000
5               ds x-large    72,000     9,999,999

We’ll get to a join later.

You also have a test for “max(ar)” in your case being applied LAST, but also under your ds-small test condition. We have no idea where this “ar” field comes from, nor its context of possible values. If string based and you have one record with value of “yes”, then it would supersede any value of “no”, even if the “no” was the latest value for some given origin. Also, by considering the volume ranges of the case/when, and the sum of volume for any given week is 18,000, but it has a “yes” for the max(ar), it would never get to your case test pushing it to the “ds standard” level. Similarly if you had a 75k volume and a max(ar) = “yes”, it too would remain ds x-large and not the ds standard. Is this the intent? Should the “yes” condition be tested up front, so if EVER a “yes”, it is always considered a standard station? If so, that condition needs to change.

Now, my next level needing clarification. You mention 4 weeks over the year but user has no context of what / why the given weeks in question being checked for. Are you looking for volume activity if a given station is growing / declining based on volume activity? If so, are you trying to provide some weight factor to upgrade / downgrade the station size? per your example: originally small, standard, small, small is considered small overall. What if the scenario was: small, small, standard, standard? Would it be upgraded to a standard size station due to its growth?

So, instead of just people throwing queries at you, understanding what you have, what you are looking for, only the proper context can really get you the answer you are probably looking for. I would suggest you edit your existing post, provide clarification within the details of that question instead of direct comments and save. THEN, you can comment me back to review the changes you posted and I can proceed with more.

REVISION / FEEDBACK

Ok, I THINK I am following you from your comments. Your second table that has 1000s of combinations is the basis to compare the volume per the given weeks, and your table is EXPLICITLY referencing the 4 weeks of 14, 27, 40 and 47. But the query you are performing is actually volume of activity per station on different weeks as per your query looking for the date weeks within 16,26,39,43. So, the column headers really dont mean anything of Wk14, Wk27, Wk40, Wk47. They just represent the 4 weeks you WANT to query based on, such as you could query for weeks 2, 11, 24, 31? In which case the status of whatever the first week is, correlates to the matched description of “wk14” in your station size table and similarly for 11 => Wk27, week 24 => Wk40 and lastly week 31 => Wk47. This Station Size table has 1000’s of combinations to take into consideration all (or most) permutations of what values COULD be in the respective weekly positions.

But, you do not have a station size table as I simulated in the first example showing the low / high range, correct? Its ok if not, just trying to look at all the pieces on the playing field.

Am I getting CLOSER to what you are trying to accomplish?

POSSIBLE SOLUTION PLEASE NOTE… I am not familiar with “dbeaver” database, but bet it is MOSTLY common sql compliant. If anything does NOT work, let me know and we can adjust.

Ok, I will try to do this in pieces so you can see how all tie together. Lets start by doing a query per week grouped

select
      lrp.station,
      DATE_PART("week", lrp.del_date) as WkNum,
      sum( lrp.volume ) / 7 as WkAvg,
      -- since the date part of the group by will always be the same
      -- correlating to the 1st, 2nd, 3rd or 4th week, I now have this
      -- extra column and just apply MAX(), otherwise SQL will NAG if 
      -- it is not part of group by and its not an aggregate function
      max( case when DATE_PART("week", lrp.del_date) = 16 then 1
           when DATE_PART("week", lrp.del_date) = 26 then 2
           when DATE_PART("week", lrp.del_date) = 39 then 3
           else 4 end ) as WeekOrder
   from
       prophecy_na.na_topology_lrp lrp
   where
          -- EXAMPLE, if you KNOW you only care about within a current year
          -- this will help optimize by pre-qualifying the single year you
          -- expect the week activity within
          lrp.del_Date >= '2021-01-01'
      AND lrp.del_date <  '2022-01-01'
      -- and NOW, only filter out for the 4 specific weeks you are about
      AND DATE_PART("week", lrp.del_date) in (16,26,39,43)
   group by
      lrp.station,
      DATE_PART("week", lrp.del_date) 

Now, if you wanted weeks that scroll over a year time, such as from October of 2020 to August of 2021, just put the starting / ending dates to filter that out so your week “43” would be representative of Year 2020 and week 3 would be Jan 2021, and a mental map on the weeks you want might be though of as (week 43 = October of 2020, week 2 = Jan of 2021, week 15 = April 2021, etc.), so your weeks might look like (43, 2, 15, 21). So do you see the rolling weeks I THINK you are trying to get covered?

If so, the above query might come up with results like below.

Station  WkNum  WkAvg     WeekOrder
x        16     34142     1
x        26     36879     2
x        39     35387     3
x        43     37114     4
y        16     43872     1
y        26     41983     2
y        39     42218     3
y        43     39461     4
z        16  ...

So now you have one master list of records, per station and the proper corresponding week you intend them per the query, and if you did the year roll-over, just change the case condition for proper week order context for week 1-4.

Now, we can run THIS against your case for the ranges to pull the descriptors. So I will use the entire query above as a “Prequery” (pq alias) and build out more. But NOW, I will apply a cross-tab to pull in the 4 possible weeks into a single row that will more directly match your Station_Size table.

select
      pq.Station,
      max( case when pq.WeekOrder = 1 
                then VolRanges.StationSize else '' end ) as Wk1Size,
      max( case when pq.WeekOrder = 2 
                then VolRanges.StationSize else '' end ) as Wk2Size,
      max( case when pq.WeekOrder = 3 
                then VolRanges.StationSize else '' end ) as Wk3Size,
      max( case when pq.WeekOrder = 4
                then VolRanges.StationSize else '' end ) as Wk4Size,
      -- just to tack on the "AR" status to extended properties
      max( case when xa.station_id is null 
                then 'no' else 'yes' end ) as ARStatus
   from
      ( entire first  pre query above ) pq

         JOIN
         ( select 'ds x-small' stationSize, 
                  0 AtLeast,
                  20000 ButLessThan
           UNION select 'ds small', 20000, 36000
           UNION select 'ds standard', 36000, 42000
           UNION select 'ds large', 42000, 72000
           UNION select 'ds x-large', 72000, 9999999 ) VolRanges
            on pq.WkAvg / 7 >= VolRanges.AtLeast
           AND pq.WkAvg / 7 < VolRanges.ButLessThan

        -- and also now getting the "ar" value from the extended properties
        -- doing a LEFT-JOIN in case no such extended properties so the given
        -- station is not left out of the query.  So here, I am EXPLICITLY
        -- pulling ONLY IF the ar = 'yes'
        LEFT JOIN wbr_global.raw_station_extended_attribute xa
           on pq.station_id = xa.ds
           AND xa.ar = 'yes'
   group by
      pq.Station

So now, we still have one row per station AND week, but now tacked-on the per-week basis description

Station Wk1Size Wk2Size Wk3Size Wk4Size ARStatus x ds small ds standard ds small ds small no y ds large ds standard ds large ds large no z … DII1 (sample from properties, assume aggregates) YES

Now, we need to get each individual week to its corresponding 4-part week table, So this is the next nesting and I will tie them to the station size table that has the 4 parts to match up with.

select
      perStation.*,
      ss.station_size
   from
      ( entire SECOND query above ) perStation
         JOIN Station_Size ss
            on perStation.Wk1Size = ss.wk_14  
           AND perStation.Wk2Size = ss.wk_27  
           AND perStation.Wk3Size = ss.wk_40
           AND perStation.Wk4Size = ss.wk_47
         

So, the above will get all EXCEPT the final “AR” consideration. So lets look at the “AR” = yes consideration. You have 5 station size types. If anything is HIGHER than standard, that wins. So, the ONLY time you want to UPGRADE someone is if the 4wk combination final station size = “ds small” AND the AR = “YES”. So lets just add that, all else, stay as you are. Ie: If the station is NO for the ar status and they are a size “ds small”, that is the ONLY time it stays small.

I would just change the above to add the extra FinalStationSize consideration.

select
      perStation.*,
      ss.station_size,
      case when     perStation.ARStatus = 'yes'
                AND ss.Station_Size = 'ds small'
           then 'ds standard'
           else ss.Station_Size end FinalStationSize
     

You may get SQL errors that may not be compatible with dbeaver database, that is why I tried to both show you the individual steps building things out, and then nesting together to get to each part. If any errors, let me know and we can try to work them out.

select
      perStation.*,
      ss.station_size,
      case when     perStation.ARStatus = 'yes'
                AND ss.Station_Size = 'ds small'
           then 'ds standard'
           else ss.Station_Size end FinalStationSize
   from
      ( select
              pq.Station,
              max( case when pq.WeekOrder = 1 
                        then VolRanges.StationSize else '' end ) as Wk1Size,
              max( case when pq.WeekOrder = 2 
                        then VolRanges.StationSize else '' end ) as Wk2Size,
              max( case when pq.WeekOrder = 3 
                        then VolRanges.StationSize else '' end ) as Wk3Size,
              max( case when pq.WeekOrder = 4
                        then VolRanges.StationSize else '' end ) as Wk4Size,
              -- just to tack on the "AR" status to extended properties
              max( case when xa.station_id is null 
                        then 'no' else 'yes' end ) as ARStatus
           from
              ( select
                      lrp.station,
                      DATE_PART("week", lrp.del_date) as WkNum,
                      sum( lrp.volume ) / 7 as WkAvg,
                      -- since the date part of the group by will always be the same
                      -- correlating to the 1st, 2nd, 3rd or 4th week, I now have this
                      -- extra column and just apply MAX(), otherwise SQL will NAG if 
                      -- it is not part of group by and its not an aggregate function
                      max( case when DATE_PART("week", lrp.del_date) = 16 then 1
                           when DATE_PART("week", lrp.del_date) = 26 then 2
                           when DATE_PART("week", lrp.del_date) = 39 then 3
                           else 4 end ) as WeekOrder
                   from
                       prophecy_na.na_topology_lrp lrp
                   where
                          -- EXAMPLE, if you KNOW you only care about within a current year
                          -- this will help optimize by pre-qualifying the single year you
                          -- expect the week activity within
                          lrp.del_Date >= '2021-01-01'
                      AND lrp.del_date <  '2022-01-01'
                      -- and NOW, only filter out for the 4 specific weeks you are about
                      AND DATE_PART("week", lrp.del_date) in (16,26,39,43)
                   group by
                      lrp.station,
                      DATE_PART("week", lrp.del_date)    ) pq

                         JOIN
                         ( select 'ds x-small' stationSize, 
                                  0 AtLeast,
                                  20000 ButLessThan
                           UNION select 'ds small', 20000, 36000
                           UNION select 'ds standard', 36000, 42000
                           UNION select 'ds large', 42000, 72000
                           UNION select 'ds x-large', 72000, 9999999 ) VolRanges
                            on pq.WkAvg / 7 >= VolRanges.AtLeast
                           AND pq.WkAvg / 7 < VolRanges.ButLessThan

                        -- and also now getting the "ar" value from the extended properties
                        -- doing a LEFT-JOIN in case no such extended properties so the given
                        -- station is not left out of the query.  So here, I am EXPLICITLY
                        -- pulling ONLY IF the ar = 'yes'
                        LEFT JOIN wbr_global.raw_station_extended_attribute xa
                           on pq.station_id = xa.ds
                           AND xa.ar = 'yes'
                   group by
                      pq.Station ) perStation
        JOIN Station_Size ss
           on perStation.Wk1Size = ss.wk_14  
          AND perStation.Wk2Size = ss.wk_27  
          AND perStation.Wk3Size = ss.wk_40
          AND perStation.Wk4Size = ss.wk_47
User contributions licensed under: CC BY-SA
9 People found this is helpful
Advertisement