Skip to content
Advertisement

Grouping the rows by similarities

I am working on SQL server.

I have the following table:

For each BIGroup I have a multiple VarianceName. For each VarianceName I have multiple PartNumbers. I am comparing every partnumber with the other partnumbers within the same BIGroup and VarianceName, and writting the number of differences between PartNumber1 and PartNumber2 in the column Difference:

ex: For D934, for VarianceName A, PartNumbers 11426777, 11426781 and 12564258 are identical because there is 0 differences between : 11426777 and 11426781, 11426781 and 12564258, and 12564258 and 11426777.

ex: For D934, for VarianceName A, PartNumbers 12542804 and 12554759 are identical because there is 0 differences between: 12542804 and 12554759.

My Goal is to identify all the group of identical PartNumbers within the same BIGroup and VarianceName. To flag those groups, I will use the column called Cluster.

So 11426777, 11426781 and 12564258 would belong to Cluster D934-A-C1.

So 12542804 and 12554759 would belong to cluster D934-A-C2.

What should be the query/stored procedure to update the Cluster column, to obtain the following result:

And so on for the other VarianceName

And so on for the other BiGroup

The column should be left to NULL if Difference > 0

Here is the script to have the data as a cte:

Edit: To better understand the problem, I drew the 5 partnumbers of D934 A, their links, and the two clusters.

The links we are interested in are the black ones (because it means there is 0 differences between the partNumbers).

The orange links are representing difference>0 between the partNumbers.

After drawing the links we can identify 2 clusters, which I drew with red circles.

enter image description here

Advertisement

Answer

I managed to solve this problem with a stored procedure:

The algorithm of the stored procedure works like this:

  • For every PartNumber with Difference=0
    • If the BiGroup or VarianceName has changed
      • I reset the @clusterIncrement to 1
    • If he isn’t already part of a Cluster
      • I set his Cluster to @clusterIncrement
      • @clusterIncrement = @clusterIncrement +1
    • If he is already part of a Cluster
      • I set his Cluster to the existing Cluster
User contributions licensed under: CC BY-SA
3 People found this is helpful
Advertisement