Skip to content
Advertisement

how i can make left join work only when the number of records in the first table more or equal the number of records in the second table?

I have two tables and I want do left join between them, but I want the left join to happen only if the count of records that contain the same value of join column in the first table is more or equal the count of records that contain the same value of join column in the second table

what I try:

first I do the count of records per the each group

and then I put condition to filter (a.cnt >= b.cnt)

this is the code:

insert into work.discount_tmp 
select SubsID, MSISDN, EppAcc, User_Name, Bill_Cycle, Tariff, Pack, Discount_Qual_ID, 
Discount_ID, Qualification_Type, Discount_Desc, Sequence, a.GroupID, RuleID, 
dib_band_id, dib_band_end, dib_charge_ref, DIB_DIS0, dib_disc_type, dib_limit_disc, 
DIB_MAX_, cd_class_type, ClassID, Class, dgh_group_id, dgh_inclusion_from, 
dgh_inclusion_to, 20191003 
from (
  (
    select *, 
      row_number() over (partition by GroupID order by Discount_ID) as seqnum,
      COUNT(*) over (partition by GroupID order by GroupID) as cnt 
    from work.disc_band 
    where tbl_dt = 20191003 
    order by Discount_ID
  ) a
  left join (
    select *, 
      row_number() over (
        partition by GroupID 
        order by cd_class_type, try(cast(coalesce(classid,'0') as bigint))
      ) as seqnum,
      count(*) over (partition by GroupID order by GroupID) as cnt 
    from work.alltable1 
    where tbl_dt = 20191003 
  ) b on a.GroupID = b.GroupID and a.cnt >= b.cnt and a.seqnum = b.seqnum
); 

But my try not work because the join done first and then the condition (so the value of cnt in the second table will not stay the same after join done)

Any idea how to make that work?

Advertisement

Answer

You want to write your query referencing a neutral “seed table” in your FROM clause. Then you can calculate the row counts from the two tables and compare them to each other. Something like this:

insert into work.discount_tmp 
select SubsID, MSISDN, EppAcc, User_Name, Bill_Cycle, Tariff, Pack, Discount_Qual_ID, 
Discount_ID, Qualification_Type, Discount_Desc, Sequence, a.GroupID, RuleID, 
dib_band_id, dib_band_end, dib_charge_ref, DIB_DIS0, dib_disc_type, dib_limit_disc, 
DIB_MAX_, cd_class_type, ClassID, Class, dgh_group_id, dgh_inclusion_from, 
dgh_inclusion_to, 20191003 
from (
  select a.*, b.*, 
  FROM (SELECT 1 AS DummyCol) AS dt -- "seed" table; not sure the Presto equivalent
  CROSS JOIN ( -- Get all qualifying rows from "disc_band"
    SELECT *, count(*) OVER(PARTITION BY groupid) AS RowCount
    FROM work.disc_band
    WHERE tbl_dt = 20191003
  ) a
  LEFT JOIN ( -- Get all qualifying rows from "alltable1"
    SELECT *, count(*) OVER(PARTITION BY groupid) AS RowCount
    FROM work.alltable1
    WHERE tbl_dt = 20191003
  ) b ON a.groupid = b.groupid AND a.RowCount >= b.RowCount
) src

I’m not sure how about the seqnum logic, so you can add it back in if needed.

I haven’t tested it and you may need to tinker with the syntax to get it to work with Presto. Give it a shot and let me know.

User contributions licensed under: CC BY-SA
9 People found this is helpful
Advertisement