Skip to content
Advertisement

Find closest match in SQL

Given a table called “Example” as follows;

ID|A|B|C
--------
01|X|Y|A
02|Z|Z|A
03|Q|P|A
  • If searching for X,Y,A = returns row 1 (Exact match)
  • If searching for Q,X,A = returns row 3 (Closest match)

I can do this as multiple seperate SQL statements

select ID from EXAMPLE where A=@A and B=@B and C=@C

… if this returns zero rows, then :

Select ID from EXAMPLE where A=@A and B=@B

… if this returns zero rows, then :

Select ID from EXAMPLE where A=@A and C=@C

… if this returns zero rows, then :

Select ID from EXAMPLE where B=@B and C=@C

… etc.

But I would imagine this is going to be very bad for performance. Is there a better way?

Advertisement

Answer

From a performance perspective, you probably want:

select t.*
from t
where t.a = @a or t.b = @b or t.c = @c
order by (case when t.a = @a then 1 else 0 end +
          case when t.b = @b then 1 else 0 end +
          case when t.c = @c then 1 else 0 end
         ) desc
fetch first 1 row only;

The where clause is important if your table has any size to it. It guarantees that at least one column matches — and that should reduces the amount of data needed for sorting.

If you have multiple indexes on the table (see further down), then an exhaustive approach using union all might provide to have better performance. Looking for full matches and matches on 2 out of 3, this looks like:

with match_full as (
      select t.*
      from t
      where a = @a and b = @b and c = @c
      fetch first 1 row only
     ),
     match_ab as (
      select *
      from t
      where t.a = @a and t.b = @b and
            not exists (select 1 from match_full)
      fetch first 1 row only
     ),
     match_ac as (
      select *
      from t
      where t.a = @a and t.c = @c and
            not exists (select 1 from match_full) and
            not exists (select 1 from match_ab) 
      fetch first 1 row only
     ),
     match_bc as (
      select *
      from t
      where t.b  = @b and t.c = @c and
            not exists (select 1 from match_full) and
            not exists (select 1 from match_ab) 
      fetch first 1 row only
     )
select *
from match_full
union all
select *
from match_ab
union all
select *
from match_ac
union all
select *
from match bc;

In particular, this can take advantage of three indexes: (a, b, c), (a, c), and (b, c). Each CTE should be a simple index lookup and it is hard to see how the query could be faster.

It can, of course, be extended to handle singleton matches as well — using the same indexes.

User contributions licensed under: CC BY-SA
2 People found this is helpful
Advertisement