Skip to content
Advertisement

How to compare two strings based on percent match in SQL

I want to post a solution to a interesting problem I was facing in T-SQL.

The problem: Compare two string fields based on a percent match. In addition, the two strings may have the words in them translocated.

For example: “Joni Bravo” and “Bravo Joni”. These two strings should return a match of 100%, which means that position is not relevant. Few more things worth noting are that this code is made to compare strings that have space as delimiter in them. If the first string doesnt have space the match is set to 100% without actual check. This was not developed, because the strings this function is ment to compare always contain two or more words. Also, it is written on MS SQL Server 2017 if that mathers.

Advertisement

Answer

So here is the solution, hope this helps anyone 🙂 gl

    /****** Object:  UserDefinedFunction [dbo].[STRCOMP]    Script Date: 29/03/2018 15:31:45 ******/
    SET ANSI_NULLS ON
    GO
    
    SET QUOTED_IDENTIFIER ON
    GO
    
    CREATE FUNCTION [dbo].[STRCOMP] (
        -- Add the parameters for the function here
        @name_1 varchar(255),@name_2 varchar(255)
    )
    RETURNS float
    AS
    BEGIN
        

-- Declare the return variable and any needed variable here
    declare @p int = 0;
    declare @c int = 0;
    declare @br int = 0;
    declare @p_temp int = 0;
    declare @emergency_stop int = 0;
    declare @fixer int = 0;
    declare @table1_temp table (
    row_id int identity(1,1),
    str1 varchar (255));
    declare @table2_temp table (
    row_Id int identity(1,1),
    str2 varchar (255));
    declare @n int = 1;
    declare @count int = 1;
    declare @result int = 0;
    declare @total_result float = 0;
    declare @result_temp int = 0;
    declare @variable float = 0.0;
    
--clean the two strings from unwanted symbols and numbers

    set @name_1 = REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(@name_1,'!',''),'  ',' '),'1',''),'2',''),'3',''),'4',''),'5',''),'0',''),'6',''),'7',''),'8',''),'9','');
    set @name_2 = REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(@name_2,'!',''),'  ',' '),'1',''),'2',''),'3',''),'4',''),'5',''),'0',''),'6',''),'7',''),'8',''),'9','');

--check if the first string has more than one words inside. If the string does 
--not have more than one words, return 100%
set @c = charindex(' ',substring(@name_1,@p,len(@name_1)));


IF(@c = 0)
BEGIN
RETURN 100.00
END;

--main logic of the operation. This is based on sound indexing and comparing the 
--outcome. This loops through the string whole words and determines their soundex
--code and then compares it one against the other to produce a definitive number --showing the raw match between the two strings @name_1 and @name_2.
WHILE (@br != 2 or @emergency_stop = 20)
BEGIN

insert into @table1_temp(str1)
select substring (@name_1,@p,@c);
set @p = len(substring (@name_1,@p,@c))+2;
set @p = @p + @p_temp - @fixer;
set @p_temp = @p;
set @c = CASE WHEN charindex(' ',substring(@name_1,@p,len(@name_1))) = 0 THEN len(@name_1) ELSE charindex(' ',substring(@name_1,@p,len(@name_1))) END;
set @fixer = 1;
set @br = CASE WHEN charindex(' ',substring(@name_1,@p,len(@name_1))) = 0 THEN @br + 1 ELSE 0 END;
set @emergency_stop = @emergency_stop +1;
END;

set @p = 0;
set @br = 0;
set @emergency_stop = 0;
set @fixer = 0;
set @p_temp = 0;
set @c = charindex(' ',substring(@name_2,@p,len(@name_2)));

WHILE (@br != 2 or @emergency_stop = 20)
BEGIN

insert into @table2_temp(str2)
select substring (@name_2,@p,@c);
set @p = len(substring (@name_2,@p,@c))+2;
set @p = @p + @p_temp - @fixer;
set @p_temp = @p;
set @c = CASE WHEN charindex(' ',substring(@name_2,@p,len(@name_2))) = 0 THEN len(@name_2) ELSE charindex(' ',substring(@name_2,@p,len(@name_2))) END;
set @fixer = 1;
set @br = CASE WHEN charindex(' ',substring(@name_2,@p,len(@name_2))) = 0 THEN @br + 1 ELSE 0 END;
set @emergency_stop = @emergency_stop +1;
END;

WHILE((select str1 from @table1_temp where row_id = @n) is not null)
BEGIN
    set @count = 1;
    set @result = 0;
    WHILE((select str2 from @table2_temp where row_id = @count) is not null)
    BEGIN
        set @result_temp = DIFFERENCE((select str1 from @table1_temp where row_id = @n),(select str2 from @table2_temp where row_id = @count));
        IF(@result_temp > @result)
            BEGIN
                set @result = @result_temp;
                
            END;
            
        set @count = @count + 1;         
    END;
    
    set @total_result = @total_result + @result;
    set @n = @n + 1;
END;

--gather the results and transform them in a percent match.
set @variable = (select @total_result / (select max(row_count) from (
select max(row_id) as row_count from @table1_temp
union
select max(row_id) as row_count from @table2_temp) a));
RETURN @variable/4 * 100;

END
GO

PS: I decided to write it in a user-defined function just for the needs of my project.

User contributions licensed under: CC BY-SA
1 People found this is helpful
Advertisement