Create an associated matching ID based on multiple columns in the same table

Question

I have an interesting issue where I need to create a unique identifier based on match groups for a set of data. This is is based on multiple criteria, but generally what I need to happen is to take this input: SOURCE_ID MATCH_ID PHONE 1 1 (999)9999999 1 2 (999)9999999 2 1 (999)9999999 213710 707187 (001)25489…

Accepted Answer

that&#8217;s some rather obtuse logic.SELECT     column1    ,column2    ,column3    ,dense_rank() over (order by rankable)FROM (        SELECT *        ,count(column1) over (partition by column1) c_c1        ,count(column3) over (partition by column3) c_c3        ,iff(c_c1> c_c3, column1::text, column3) as rankable    FROM VALUES    (1,1,'(999)9999999'),    (1, 2,'(999)9999999'),    (2, 1,'(999)9999999'),    (213710,    707187,'(001)2548987'),    (213710,    759263,'(100)8348243'),    (213705,    2416730,'(156)6676200'),    (213705,    12116102,'(132)3453523'))gives:COLUMN1COLUMN2COLUMN3DENSE_RANK() OVER (ORDER BY RANKABLE)11(999)9999999112(999)9999999121(999)99999991213,7052,416,730(156)66762002213,70512,116,102(132)34535232213,710707,187(001)25489873213,710759,263(100)83482433A More Complex answer:So your extended problem, shows you are actually clustering on SETS thus for any SOURCE_ID all PHONE&#8216;s are part of the same set, and thus all SOURCE_ID's that are part of the PHONE`&#8217;s set are also in the group. This really should be solved with a recursive CTE to allow for more that 2 steps relationship. Here is a solution that handles 2 layers of chaining..WITH data AS (    SELECT * FROM VALUES    (2, '(999)9999999'),    (1, '(999)9999999'),    (1, '(999)9999999'),    (2, '(999)9999999'),        (213705, 'AAA'),    (213705, 'AAB'),    (213705, 'AAC'),    (9624765, 'AAA'),    (9624765, 'AAB'),    (9624765, 'AAC'),    (2175594867, 'AAA'),    (2175594867, 'AAB'),        (213710, 'BAA'),    (213710, 'BAB'),    (9213710, 'BAA'),    (9213710, 'BAB'),    (89213710, 'BAA'),    (89213710, 'BAB')), col1 as (    select column1        ,array_agg(DISTINCT column2) as col2_array    from data    group by 1), col2 as (    select     *,         row_number() over (order by true) as rn     FROM (        select col2_array            ,array_agg(DISTINCT column1) as col1_array        from col1        group by 1    ))SELECT d.column1, d.column2, r.rnFROM data as dJOIN col2 as r     on array_contains(d.column1::variant, r.col1_array)     and array_contains(d.column2::variant, r.col2_array)ORDER BY 3;COLUMN1COLUMN2RN89,213,710BAB189,213,710BAA19,213,710BAB19,213,710BAA1213,710BAB1213,710BAA11(999)999999922(999)999999921(999)999999922(999)999999922,175,594,867AAA32,175,594,867AAB39,624,765AAA49,624,765AAB49,624,765AAC4213,705AAC4213,705AAB4213,705AAA4

SOURCE_ID	MATCH_ID	PHONE
1	1	(999)9999999
1	2	(999)9999999
2	1	(999)9999999
213710	707187	(001)2548987
213710	759263	(100)8348243
213705	2416730	(156)6676200
213705	12116102	(132)3453523

COLUMN1	COLUMN2	RN
89,213,710	BAB	1
89,213,710	BAA	1
9,213,710	BAB	1
9,213,710	BAA	1
213,710	BAB	1
213,710	BAA	1
1	(999)9999999	2
2	(999)9999999	2
1	(999)9999999	2
2	(999)9999999	2
2,175,594,867	AAA	3
2,175,594,867	AAB	3
9,624,765	AAA	4
9,624,765	AAB	4
9,624,765	AAC	4
213,705	AAC	4
213,705	AAB	4
213,705	AAA	4

Advertisement

Answer

A More Complex answer: