I am working on SQL server.
I have the following table:
For each BIGroup I have a multiple VarianceName. For each VarianceName I have multiple PartNumbers. I am comparing every partnumber with the other partnumbers within the same BIGroup and VarianceName, and writting the number of differences between PartNumber1 and PartNumber2 in the column Difference:
+---------+--------------+-------------+-------------+------------+-----------+ | BIGroup | VarianceName | PartNumber1 | PartNumber2 | Difference | Cluster | +---------+--------------+-------------+-------------+------------+-----------+ | D934 | A | 11426777 | 11426777 | 0 | | | D934 | A | 11426777 | 11426781 | 0 | | | D934 | A | 11426777 | 12542804 | 2 | | | D934 | A | 11426777 | 12554759 | 4 | | | D934 | A | 11426777 | 12564258 | 0 | | | D934 | A | 11426781 | 11426777 | 0 | | | D934 | A | 11426781 | 11426781 | 0 | | | D934 | A | 11426781 | 12542804 | 5 | | | D934 | A | 11426781 | 12554759 | 1 | | | D934 | A | 11426781 | 12564258 | 0 | | | D934 | A | 12542804 | 11426777 | 2 | | | D934 | A | 12542804 | 11426781 | 5 | | | D934 | A | 12542804 | 12542804 | 0 | | | D934 | A | 12542804 | 12554759 | 0 | | | D934 | A | 12542804 | 12564258 | 8 | | | D934 | A | 12554759 | 11426777 | 4 | | | D934 | A | 12554759 | 11426781 | 1 | | | D934 | A | 12554759 | 12542804 | 0 | | | D934 | A | 12554759 | 12554759 | 0 | | | D934 | A | 12554759 | 12564258 | 9 | | | D934 | A | 12564258 | 11426777 | 0 | | | D934 | A | 12564258 | 11426781 | 0 | | | D934 | A | 12564258 | 12542804 | 8 | | | D934 | A | 12564258 | 12554759 | 9 | | | D934 | A | 12564258 | 12564258 | 0 | | | D934 | AA | 11438878 | 11438878 | 0 | | | D934 | AB | 11438924 | 11438924 | 0 | | | D934 | AC | 12556213 | 12556213 | 0 | | | D934 | AC | 12556213 | 12556214 | 5 | | | D934 | AC | 12556214 | 12556213 | 5 | | | D934 | AC | 12556214 | 12556214 | 0 | | | D955 | A | 75346846 | 75346846 | 0 | | | ... | ... | ... | ... | 0 | | +---------+--------------+-------------+-------------+------------+-----------+
ex: For D934, for VarianceName A, PartNumbers 11426777, 11426781 and 12564258 are identical because there is 0 differences between : 11426777 and 11426781, 11426781 and 12564258, and 12564258 and 11426777.
ex: For D934, for VarianceName A, PartNumbers 12542804 and 12554759 are identical because there is 0 differences between: 12542804 and 12554759.
My Goal is to identify all the group of identical PartNumbers within the same BIGroup and VarianceName. To flag those groups, I will use the column called Cluster.
So 11426777, 11426781 and 12564258 would belong to Cluster D934-A-C1.
So 12542804 and 12554759 would belong to cluster D934-A-C2.
What should be the query/stored procedure to update the Cluster column, to obtain the following result:
+---------+--------------+-------------+-------------+------------+-----------+ | BIGroup | VarianceName | PartNumber1 | PartNumber2 | Difference | Cluster | +---------+--------------+-------------+-------------+------------+-----------+ | D934 | A | 11426777 | 11426777 | 0 | D934-A-C1 | | D934 | A | 11426777 | 11426781 | 0 | D934-A-C1 | | D934 | A | 11426777 | 12542804 | 2 | | | D934 | A | 11426777 | 12554759 | 4 | | | D934 | A | 11426777 | 12564258 | 0 | D934-A-C1 | | D934 | A | 11426781 | 11426777 | 0 | D934-A-C1 | | D934 | A | 11426781 | 11426781 | 0 | D934-A-C1 | | D934 | A | 11426781 | 12542804 | 5 | | | D934 | A | 11426781 | 12554759 | 1 | | | D934 | A | 11426781 | 12564258 | 0 | D934-A-C1 | | D934 | A | 12542804 | 11426777 | 2 | | | D934 | A | 12542804 | 11426781 | 5 | | | D934 | A | 12542804 | 12542804 | 0 | D934-A-C2 | | D934 | A | 12542804 | 12554759 | 0 | D934-A-C2 | | D934 | A | 12542804 | 12564258 | 8 | | | D934 | A | 12554759 | 11426777 | 4 | | | D934 | A | 12554759 | 11426781 | 1 | | | D934 | A | 12554759 | 12542804 | 0 | D934-A-C2 | | D934 | A | 12554759 | 12554759 | 0 | D934-A-C2 | | D934 | A | 12554759 | 12564258 | 9 | | | D934 | A | 12564258 | 11426777 | 0 | D934-A-C1 | | D934 | A | 12564258 | 11426781 | 0 | D934-A-C1 | | D934 | A | 12564258 | 12542804 | 8 | | | D934 | A | 12564258 | 12554759 | 9 | | | D934 | A | 12564258 | 12564258 | 0 | D934-A-C1 |
And so on for the other VarianceName
| D934 | AA | 11438878 | 11438878 | 0 | D934-AA-C1 | D934 | AB | 11438924 | 11438924 | 0 | D934-AB-C1 | D934 | AC | 12556213 | 12556213 | 0 | D934-AC-C1 | D934 | AC | 12556213 | 12556214 | 5 | | D934 | AC | 12556214 | 12556213 | 5 | | D934 | AC | 12556214 | 12556214 | 0 | D934-AC-C1
And so on for the other BiGroup
| D955 | A | 75346846 | 75346846 | 0 | D955-A-C1 | ... | ... | ... | ... | ... | +---------+--------------+-------------+-------------+------------+-----------+
The column should be left to NULL if Difference > 0
Here is the script to have the data as a cte:
with t1 as ( select 'D934' as BIGroup ,'A' as VarianceName , 11426777 as PartNumber1, 11426777 as PartNumber2, 0 as Difference, null as Cluster union select 'D934' ,'A' , 11426777 , 11426781 , 0 , null union select 'D934' ,'A' , 11426777 , 12542804 , 2 , null union select 'D934' ,'A' , 11426777 , 12554759 , 4 , null union select 'D934' ,'A' , 11426777 , 12564258 , 0 , null union select 'D934' ,'A' , 11426781 , 11426777 , 0 , null union select 'D934' ,'A' , 11426781 , 11426781 , 0 , null union select 'D934' ,'A' , 11426781 , 12542804 , 5 , null union select 'D934' ,'A' , 11426781 , 12554759 , 1 , null union select 'D934' ,'A' , 11426781 , 12564258 , 0 , null union select 'D934' ,'A' , 12542804 , 11426777 , 2 , null union select 'D934' ,'A' , 12542804 , 11426781 , 5 , null union select 'D934' ,'A' , 12542804 , 12542804 , 0 , null union select 'D934' ,'A' , 12542804 , 12554759 , 0 , null union select 'D934' ,'A' , 12542804 , 12564258 , 8 , null union select 'D934' ,'A' , 12554759 , 11426777 , 4 , null union select 'D934' ,'A' , 12554759 , 11426781 , 1 , null union select 'D934' ,'A' , 12554759 , 12542804 , 0 , null union select 'D934' ,'A' , 12554759 , 12554759 , 0 , null union select 'D934' ,'A' , 12554759 , 12564258 , 9 , null union select 'D934' ,'A' , 12564258 , 11426777 , 0 , null union select 'D934' ,'A' , 12564258 , 11426781 , 0 , null union select 'D934' ,'A' , 12564258 , 12542804 , 8 , null union select 'D934' ,'A' , 12564258 , 12554759 , 9 , null union select 'D934' ,'A' , 12564258 , 12564258 , 0 , null union select 'D934' ,'AA' , 11438878 , 11438878 , 0 , null union select 'D934' ,'AB' , 11438924 , 11438924 , 0 , null union select 'D934' ,'AC' , 12556213 , 12556213 , 0 , null union select 'D934' ,'AC' , 12556213 , 12556214 , 5 , null union select 'D934' ,'AC' , 12556214 , 12556213 , 5 , null union select 'D934' ,'AC' , 12556214 , 12556214 , 0 , null union select 'D955' ,'A' , 75346846 , 75346846 , 0 , null )
Edit:
To better understand the problem, I drew the 5 partnumbers of D934
A
, their links, and the two clusters.
The links we are interested in are the black ones (because it means there is 0 differences between the partNumbers).
The orange links are representing difference>0 between the partNumbers.
After drawing the links we can identify 2 clusters, which I drew with red circles.
Advertisement
Answer
I managed to solve this problem with a stored procedure:
DECLARE @BiGroup [nvarchar](30); DECLARE @VarianceName [nvarchar](30); DECLARE @NewBiGroup [nvarchar](30); DECLARE @NewVarianceName [nvarchar](30); DECLARE @PartNumber [nvarchar](30); DECLARE @ClusterName [nvarchar](30); DECLARE @IncrementClusterName [nvarchar](30); set @BiGroup = 'first_BiGroup'; set @VarianceName = 'first_VarianceName'; set @IncrementClusterName = 1; set @ClusterName = null; -- Declare cursor DECLARE cur CURSOR READ_ONLY FOR Select [PartNumber1] FROM t1 order by [BIGroup] ,[VarianceName] ,[PartNumber1]; --clean cluster column update t1 set [Cluster]=null; OPEN cur FETCH NEXT FROM cur INTO @PartNumber -- Loop on every PartNumber WHILE @@FETCH_STATUS = 0 BEGIN --set NewBiGroup and NewPartNumber set @NewBiGroup = (select Top(1) [BIGroup] from t1 where partnumber1 = @PartNumber); set @NewVarianceName = (select Top(1) [VarianceName] from t1 where partnumber1 = @PartNumber); --check if we are still in the same BIGroup and Variance, otherwise, reset the cluster increment if @NewBiGroup <> @BiGroup or @NewVarianceName <> @VarianceName BEGIN set @IncrementClusterName = 1; END --get the clusterName of this partNumber, if it exists set @ClusterName = (select Top(1) [Cluster] from t1 where partnumber2 = @PartNumber and [Cluster] is not null); --if ClusterName is NULL, put a clustername and then increment the @IncrementClusterName, --otherwise set the cluster to @ClusterName if @ClusterName is null BEGIN update t1 set [Cluster] = @NewBiGroup+'-'+@NewVarianceName+'-'+@IncrementClusterName where partnumber1 = @PartNumber and Difference= 0 ; set @IncrementClusterName = @IncrementClusterName +1; END else BEGIN update t1 set [Cluster] = @NewBiGroup+'-'+@NewVarianceName+'-'+@ClusterName where partnumber1 = @PartNumber and Difference= 0 ; END -- setting the BiGroup and VarianceName set @BiGroup = @NewBiGroup; set @VarianceName = @NewVarianceName; FETCH NEXT FROM cur INTO @PartNumber END CLOSE cur DEALLOCATE cur
The algorithm of the stored procedure works like this:
- For every
PartNumber
with Difference=0- If the
BiGroup
orVarianceName
has changed- I reset the
@clusterIncrement
to 1
- I reset the
- If he isn’t already part of a
Cluster
- I set his
Cluster
to@clusterIncrement
@clusterIncrement = @clusterIncrement +1
- I set his
- If he is already part of a
Cluster
- I set his
Cluster
to the existingCluster
- I set his
- If the