I have two tables and I want do left join between them, but I want the left join to happen only if the count of records that contain the same value of join column in the first table is more or equal the count of records that contain the same value of join column in the second table
what I try:
first I do the count of records per the each group
and then I put condition to filter (a.cnt >= b.cnt)
this is the code:
insert into work.discount_tmp select SubsID, MSISDN, EppAcc, User_Name, Bill_Cycle, Tariff, Pack, Discount_Qual_ID, Discount_ID, Qualification_Type, Discount_Desc, Sequence, a.GroupID, RuleID, dib_band_id, dib_band_end, dib_charge_ref, DIB_DIS0, dib_disc_type, dib_limit_disc, DIB_MAX_, cd_class_type, ClassID, Class, dgh_group_id, dgh_inclusion_from, dgh_inclusion_to, 20191003 from ( ( select *, row_number() over (partition by GroupID order by Discount_ID) as seqnum, COUNT(*) over (partition by GroupID order by GroupID) as cnt from work.disc_band where tbl_dt = 20191003 order by Discount_ID ) a left join ( select *, row_number() over ( partition by GroupID order by cd_class_type, try(cast(coalesce(classid,'0') as bigint)) ) as seqnum, count(*) over (partition by GroupID order by GroupID) as cnt from work.alltable1 where tbl_dt = 20191003 ) b on a.GroupID = b.GroupID and a.cnt >= b.cnt and a.seqnum = b.seqnum );
But my try not work because the join done first and then the condition (so the value of cnt
in the second table will not stay the same after join done)
Any idea how to make that work?
Advertisement
Answer
You want to write your query referencing a neutral “seed table” in your FROM
clause. Then you can calculate the row counts from the two tables and compare them to each other. Something like this:
insert into work.discount_tmp select SubsID, MSISDN, EppAcc, User_Name, Bill_Cycle, Tariff, Pack, Discount_Qual_ID, Discount_ID, Qualification_Type, Discount_Desc, Sequence, a.GroupID, RuleID, dib_band_id, dib_band_end, dib_charge_ref, DIB_DIS0, dib_disc_type, dib_limit_disc, DIB_MAX_, cd_class_type, ClassID, Class, dgh_group_id, dgh_inclusion_from, dgh_inclusion_to, 20191003 from ( select a.*, b.*, FROM (SELECT 1 AS DummyCol) AS dt -- "seed" table; not sure the Presto equivalent CROSS JOIN ( -- Get all qualifying rows from "disc_band" SELECT *, count(*) OVER(PARTITION BY groupid) AS RowCount FROM work.disc_band WHERE tbl_dt = 20191003 ) a LEFT JOIN ( -- Get all qualifying rows from "alltable1" SELECT *, count(*) OVER(PARTITION BY groupid) AS RowCount FROM work.alltable1 WHERE tbl_dt = 20191003 ) b ON a.groupid = b.groupid AND a.RowCount >= b.RowCount ) src
I’m not sure how about the seqnum
logic, so you can add it back in if needed.
I haven’t tested it and you may need to tinker with the syntax to get it to work with Presto
. Give it a shot and let me know.