Skip to content
Advertisement

Generate Result based on max count in secondary column after a join

I have two tables which have a common key between them, and quite a lot of other important infos ; for the sake of simplicity i will be using Combination A and Combination B. When a combination is met, whichever table has the maximum number of records should be the source where i collect the information ; in this case say IDs. The priority when counts are same is Table1.

COMMONKEY column is the combination/join condition in my tables.

    (Table 1)

  SELECT '123' table1_id,'Comb A' commonkey from dual UNION
  SELECT '124' table1_id,'Comb A' commonkey from dual UNION
  SELECT '125' table1_id,'Comb A' commonkey from dual UNION
  SELECT '126' table1_id,'Comb A' commonkey from dual UNION
  SELECT '215' table1_id,'Comb B' commonkey from dual UNION
  SELECT '216' table1_id,'Comb B' commonkey from dual UNION
  SELECT '559' table1_id,'Random Combination 1' commonkey from dual UNION
  SELECT '560' table1_id,'Random Combination 2' commonkey from dual ;   
                                 
    ( Table 2 )     
        
  SELECT 'abc1' table2_id,'Comb A' commonkey from dual  UNION
  SELECT 'abc2' table2_id,'Comb A' commonkey from dual  UNION
  SELECT 'abc3' table2_id,'Comb A' commonkey from dual  UNION
  SELECT 'abc4' table2_id,'Comb A' commonkey from dual  UNION
  SELECT 'xyz1' table2_id,'Comb B' commonkey from dual  UNION
  SELECT 'xyz2' table2_id,'Comb B' commonkey from dual  UNION
  SELECT 'xyz3' table2_id,'Comb B' commonkey from dual  UNION
  SELECT 'xyz2' table2_id,'Comb B' commonkey from dual  UNION 
  SELECT '416abc1' table2_id,'Random Combination 91' commonkey from dual UNION
  SELECT '416abc2' table2_id,'Random Combination 92' commonkey from dual;
  
    
    

Result Set Expected :

ID        COMMONKEY         
123       Comb A            
124       Comb A            
125       Comb A            
126       Comb A            
xyz1      Comb B            
xyz2      Comb B            
xyz3      Comb B            
559       Random Combination 1          
560       Random Combination 1          
416abc1   Random Combination 91         
416abc2   Random Combination 92 

Updated Image :

( the image shows a screenshot of the trail data in an excel; The Requirement and Strategy are color mapped to make it quickly understandable )

I need to generate the result set using SQL as follows :

When table1.commonkey = table2.commonkey hits, I need to-

  • If table1 has 10 IDs, table2 has 5 IDs -> Pick 10 IDs from table1.
  • If table1 has 15 IDs, table2 has 30 IDs -> Pick 30 IDs from table2.
  • If table1 has 4 IDs, table2 has 4 IDs -> Pick 4 IDs from table1. ( when equal, choose table1 IDs )
  • When no matches occur with the common key, prevent a cross join and add in the rowsets linearly to the result table.

Edit : I’ve initially gone on routes with

  a left join b where b.key IS null ;
  a full outer join b where b.key IS NULL or a.key is NULL ;

to achieve workarounds with A-B or B-A result sets but both these approaches were quite wrong. Gathering Delta sets or Exclusion sets didnt go well.

Advertisement

Answer

Here’s one option; see comments within code

SQL> with
  2  -- sample data
  3  a (id, ckey) as
  4    (select '123', 'ca' from dual union all
  5     select '124', 'ca' from dual union all
  6     select '125', 'ca' from dual union all
  7     select '126', 'ca' from dual union all
  8     select '215', 'cb' from dual union all
  9     select '216', 'cb' from dual union all
 10     select '551', 'r1' from dual union all
 11     select '552', 'r2' from dual
 12    ),
 13  b (id, ckey) as
 14    (select 'abc1', 'ca' from dual union all
 15     select 'abc2', 'ca' from dual union all
 16     select 'abc3', 'ca' from dual union all
 17     select 'abc4', 'ca' from dual union all
 18     select 'xyz1', 'cb' from dual union all
 19     select 'xyz2', 'cb' from dual union all
 20     select 'xyz3', 'cb' from dual union all
 21     select '9991', 'r3' from dual union all
 22     select '9992', 'r4' from dual
 23    ),

 24  -- count rows per each CKEY (common key)
 25  tempa as
 26    (select id, ckey, count(*) over (partition by ckey) cnt
 27     from a
 28    ),
 29  tempb as
 30    (select id, ckey, count(*) over (partition by ckey) cnt
 31     from b
 32    )
 33  -- final query
 34  select distinct
 35    case when a.cnt >= b.cnt then a.id
 36         else b.id
 37    end id,
 38    a.ckey
 39  from tempa a join tempb b on b.ckey = a.ckey
 40  union all
 41  select ckey, id from a
 42    where not exists (select null from b where a.ckey = b.ckey)
 43  union all
 44  select ckey, id from b
 45    where not exists (select null from a where a.ckey = b.ckey)
 46  order by 1, 2;

which results in

ID   CKEY
---- -----
r1   551
r2   552
r3   9991
r4   9992
xyz1 cb
xyz2 cb
xyz3 cb
123  ca
124  ca
125  ca
126  ca

11 rows selected.

SQL>
User contributions licensed under: CC BY-SA
7 People found this is helpful
Advertisement