Skip to content
Advertisement

Create an associated matching ID based on multiple columns in the same table

I have an interesting issue where I need to create a unique identifier based on match groups for a set of data. This is is based on multiple criteria, but generally what I need to happen is to take this input:

SOURCE_ID MATCH_ID PHONE
1 1 (999)9999999
1 2 (999)9999999
2 1 (999)9999999
213710 707187 (001)2548987
213710 759263 (100)8348243
213705 2416730 (156)6676200
213705 12116102 (132)3453523

And it needs to look like this as the output:

SOURCE_ID MATCH_ID PHONE GENERATED_ID
1 1 (999)9999999 1
1 2 (999)9999999 1
2 1 (999)9999999 1
213710 707187 (001)2548987 2
213710 759263 (100)8348243 2
213705 2416730 (156)6676200 3
213705 12116102 (132)3453523 3

I’ve utilized the DENSE_RANK() function to create two separate IDs, one sorting on PHONE, the other SOURCEID column. The PHONE sort gives me the correct output for lines 1-3, but incorrect for 4-7, while the SOURCE_ID sort works on lines 4-6, but not 1-3.

How can I combine these in a way that gives the desired output above? I’ve tried combining the columns in every format possible, but no luck there either.

Output from testing, highlighted correct results. Each TEST## column is noted below

The SQL for reference:

SELECT SOURCE_ID, 
    MATCH_ID, 
    PHONE, 
    DENSE_RANK() OVER(ORDER BY PHONE) AS PHONE_SORT
    DENSE_RANK() OVER(ORDER BY SOURCE_ID) AS SOURCE_ID_SORT
    DENSE_RANK() OVER(ORDER BY MATCH_ID, INTERNAL_ROW_ID) AS TEST1,
    DENSE_RANK() OVER(ORDER BY SOURCE_ID, MATCH_ID) AS TEST2,
    DENSE_RANK() OVER(ORDER BY MATCH_ID) AS TEST3,
    DENSE_RANK() OVER(ORDER BY MATCH_ID, SOURCE_ID, PHONE) AS TEST4,
    DENSE_RANK() OVER(ORDER BY MATCH_ID, PHONE, SOURCE_ID) AS TEST5,
    DENSE_RANK() OVER(ORDER BY SOURCE_ID, MATCH_ID, PHONE) AS TEST6,
    DENSE_RANK() OVER(ORDER BY SOURCE_ID, PHONE, MATCH_ID) AS TEST7,
    DENSE_RANK() OVER(ORDER BY PHONE, SOURCE_ID, MATCH_ID) AS TEST8,
    DENSE_RANK() OVER(ORDER BY PHONE, MATCH_ID, SOURCE_ID) AS TEST9,
    DENSE_RANK() OVER(ORDER BY PHONE, SOURCE_ID) AS TEST10,
    DENSE_RANK() OVER(ORDER BY PHONE, MATCH_ID) AS TEST11,     
    DENSE_RANK() OVER(ORDER BY SOURCE_ID, PHONE) AS TEST12,
    DENSE_RANK() OVER(ORDER BY MATCH_ID, PHONE) AS TEST13      
FROM MY_TABLE; 

TIA!

Update — this is working somewhat as expected, but when bringing in additional records (this was a very small subset of the full dataset), starting to run into additional scenarios. Now, trying to make a larger dataset look closer to this:

Correct result set

I used the following code to get almost there but struggling to reassociate those last few records appropriately:

SELECT SOURCE_ID, 
      MATCH_ID, 
      PHONE, 
      DENSE_RANK() OVER(ORDER BY RANKABLE_MATCH_ID) AS GENERATED_ID
    FROM (
          SELECT SOURCE_ID,
                MATCH_ID,
                PHONE,
                COUNT(MATCH_ID) OVER (PARTITION BY MATCH_ID) C_MATCH_ID,
                IFF(C_PHONE >= C_SOURCE_BY_MATCH AND C_MATCH_ID = C_PHONE, SOURCE_ID::TEXT, RANKABLE_INTERNAL_PHONE) AS RANKABLE_MATCH_ID
          FROM ( 
                SELECT SOURCE_ID,
                  MATCH_ID,
                  PHONE,
                      COUNT(SOURCE_ID) OVER (PARTITION BY SOURCE_ID) C_SOURCE_ID,
                      COUNT(PHONE) OVER (PARTITION BY PHONE) C_PHONE,
                      COUNT(DISTINCT SOURCE_ID) OVER(PARTITION BY MATCH_ID) C_SOURCE_BY_MATCH,  
                      IFF(C_SOURCE_ID > C_PHONE, SOURCE_ID::TEXT, PHONE) AS RANKABLE_INTERNAL_PHONE
                  FROM MY_TABLE
                )
)

My output based on above code My output based on above code

Advertisement

Answer

that’s some rather obtuse logic.

SELECT 
    column1
    ,column2
    ,column3
    ,dense_rank() over (order by rankable)
FROM (    
    SELECT *
        ,count(column1) over (partition by column1) c_c1
        ,count(column3) over (partition by column3) c_c3
        ,iff(c_c1> c_c3, column1::text, column3) as rankable
    FROM VALUES
    (1,1,'(999)9999999'),
    (1, 2,'(999)9999999'),
    (2, 1,'(999)9999999'),
    (213710,    707187,'(001)2548987'),
    (213710,    759263,'(100)8348243'),
    (213705,    2416730,'(156)6676200'),
    (213705,    12116102,'(132)3453523')
)

gives:

COLUMN1 COLUMN2 COLUMN3 DENSE_RANK() OVER (ORDER BY RANKABLE)
1 1 (999)9999999 1
1 2 (999)9999999 1
2 1 (999)9999999 1
213,705 2,416,730 (156)6676200 2
213,705 12,116,102 (132)3453523 2
213,710 707,187 (001)2548987 3
213,710 759,263 (100)8348243 3

A More Complex answer:

So your extended problem, shows you are actually clustering on SETS thus for any SOURCE_ID all PHONE‘s are part of the same set, and thus all SOURCE_ID's that are part of the PHONE`’s set are also in the group. This really should be solved with a recursive CTE to allow for more that 2 steps relationship. Here is a solution that handles 2 layers of chaining..

WITH data AS (
    SELECT * FROM VALUES
    (2, '(999)9999999'),
    (1, '(999)9999999'),
    (1, '(999)9999999'),
    (2, '(999)9999999'),
    
    (213705, 'AAA'),
    (213705, 'AAB'),
    (213705, 'AAC'),
    (9624765, 'AAA'),
    (9624765, 'AAB'),
    (9624765, 'AAC'),
    (2175594867, 'AAA'),
    (2175594867, 'AAB'),
    
    (213710, 'BAA'),
    (213710, 'BAB'),
    (9213710, 'BAA'),
    (9213710, 'BAB'),
    (89213710, 'BAA'),
    (89213710, 'BAB')
), col1 as (
    select column1
        ,array_agg(DISTINCT column2) as col2_array
    from data
    group by 1
), col2 as (
    select 
    *, 
        row_number() over (order by true) as rn 
    FROM (
        select col2_array
            ,array_agg(DISTINCT column1) as col1_array
        from col1
        group by 1
    )
)
SELECT d.column1, d.column2, r.rn
FROM data as d
JOIN col2 as r 
    on array_contains(d.column1::variant, r.col1_array) 
    and array_contains(d.column2::variant, r.col2_array)
ORDER BY 3;
COLUMN1 COLUMN2 RN
89,213,710 BAB 1
89,213,710 BAA 1
9,213,710 BAB 1
9,213,710 BAA 1
213,710 BAB 1
213,710 BAA 1
1 (999)9999999 2
2 (999)9999999 2
1 (999)9999999 2
2 (999)9999999 2
2,175,594,867 AAA 3
2,175,594,867 AAB 3
9,624,765 AAA 4
9,624,765 AAB 4
9,624,765 AAC 4
213,705 AAC 4
213,705 AAB 4
213,705 AAA 4
User contributions licensed under: CC BY-SA
3 People found this is helpful
Advertisement