Skip to content
Advertisement

Create an associated matching ID based on multiple columns in the same table

I have an interesting issue where I need to create a unique identifier based on match groups for a set of data. This is is based on multiple criteria, but generally what I need to happen is to take this input:

SOURCE_ID MATCH_ID PHONE
1 1 (999)9999999
1 2 (999)9999999
2 1 (999)9999999
213710 707187 (001)2548987
213710 759263 (100)8348243
213705 2416730 (156)6676200
213705 12116102 (132)3453523

And it needs to look like this as the output:

SOURCE_ID MATCH_ID PHONE GENERATED_ID
1 1 (999)9999999 1
1 2 (999)9999999 1
2 1 (999)9999999 1
213710 707187 (001)2548987 2
213710 759263 (100)8348243 2
213705 2416730 (156)6676200 3
213705 12116102 (132)3453523 3

I’ve utilized the DENSE_RANK() function to create two separate IDs, one sorting on PHONE, the other SOURCEID column. The PHONE sort gives me the correct output for lines 1-3, but incorrect for 4-7, while the SOURCE_ID sort works on lines 4-6, but not 1-3.

How can I combine these in a way that gives the desired output above? I’ve tried combining the columns in every format possible, but no luck there either.

Output from testing, highlighted correct results. Each TEST## column is noted below

The SQL for reference:

TIA!

Update — this is working somewhat as expected, but when bringing in additional records (this was a very small subset of the full dataset), starting to run into additional scenarios. Now, trying to make a larger dataset look closer to this:

Correct result set

I used the following code to get almost there but struggling to reassociate those last few records appropriately:

My output based on above code My output based on above code

Advertisement

Answer

that’s some rather obtuse logic.

gives:

COLUMN1 COLUMN2 COLUMN3 DENSE_RANK() OVER (ORDER BY RANKABLE)
1 1 (999)9999999 1
1 2 (999)9999999 1
2 1 (999)9999999 1
213,705 2,416,730 (156)6676200 2
213,705 12,116,102 (132)3453523 2
213,710 707,187 (001)2548987 3
213,710 759,263 (100)8348243 3

A More Complex answer:

So your extended problem, shows you are actually clustering on SETS thus for any SOURCE_ID all PHONE‘s are part of the same set, and thus all SOURCE_ID's that are part of the PHONE`’s set are also in the group. This really should be solved with a recursive CTE to allow for more that 2 steps relationship. Here is a solution that handles 2 layers of chaining..

COLUMN1 COLUMN2 RN
89,213,710 BAB 1
89,213,710 BAA 1
9,213,710 BAB 1
9,213,710 BAA 1
213,710 BAB 1
213,710 BAA 1
1 (999)9999999 2
2 (999)9999999 2
1 (999)9999999 2
2 (999)9999999 2
2,175,594,867 AAA 3
2,175,594,867 AAB 3
9,624,765 AAA 4
9,624,765 AAB 4
9,624,765 AAC 4
213,705 AAC 4
213,705 AAB 4
213,705 AAA 4
User contributions licensed under: CC BY-SA
3 People found this is helpful
Advertisement