Skip to content
Advertisement

How to to create a new identifier in SQL for related addresses that have multiple relationships?

I have a data set like below that includes address and customer_id. In this example multiple customers can ship to the same address and a customer can ship to multiple addresses. I would like to use the relationships between these to create a new ID that joins all of the related addresses and customer_id’s with a new identifier.

Original Table

Desired Output Table

The challenge is the number of related associations could be infinite and the SQL I have below will only work across two joins.

Here is the SQL I have which works for this small data set but when scaled to even more relationships it would require more joins. Any thoughts on the proper way to structure this would be appreciated!

Advertisement

Answer

This is a graph-walking problem. I would start by rephrasing the data a bit. It helps for each row to have a unique identifier, which is appropriately called id. Hence, I renamed id as name.

Then, construct the edges and use a recursive CTE to walk through the graph. In SQL Server, this looks like:

Here is a db<>fiddle.

User contributions licensed under: CC BY-SA
6 People found this is helpful
Advertisement