Skip to content
Advertisement

How to count distance between two lines in a SQL database?

Here is a sample of my database with 3 colums : class, year and student_name.

enter image description here

I search the “distance” in terms of common student between two given student. (Like in the famous game “6 degrees of Bacon”)

The goal is to calculate the distance between Momo Thary (2018) and Paul Biloux (2020). And the answer is 2 students : Momo Thary was in the same class than Jack Spiral. Jack Spiral was in the same class than Lucien Lake. And Lucien Lake was in the same class than Paul Biloux.

So we can “link” Momo Thary and Paul Biloux with 2 students.

I am using SQLite online. With the following code, it is possible to know the common player of Jack Spiral and Paul Biloux :

WITH
  names(name) AS (SELECT 'Jack Spiral' UNION ALL SELECT 'Paul Biloux'),
  common AS (
    SELECT class, year
    FROM Lignes
    WHERE student_name IN names
    GROUP BY class, year
    HAVING COUNT(DISTINCT student_name) = (SELECT COUNT(*) FROM names)
  )
SELECT l1.student_name student
FROM Lignes l1 INNER JOIN Lignes l2
ON l2.year = l1.year AND l2.class = l1.class AND l2.student_name <> l1.student_name
WHERE l2.student_name IN names
GROUP BY student
HAVING COUNT(DISTINCT l2.student_name) = (SELECT COUNT(*) FROM names)
   AND SUM((l1.class, l1.year) IN common) = 0

The output is Lucien Lake.

But if I use this script for Momo Thary and Paul Biloux, there is no output. In fact, is it required to use Jack Spiral and then Lucien Lake to make the link.

Is it possible to adapt the script, to first know the quickest link (if there are many possible link), and then to know what is the path in term of student ? May SQL be suited to do this type of code or another language would be more appropriate ?

Advertisement

Answer

You can use a recursive query :

WITH RECURSIVE rec(distance, student_name, path) AS (
  -- start with Paul Biloux
  SELECT 0, 'Paul Biloux', '|Paul Biloux|'
  UNION
  -- increment distance
  SELECT r.distance + 1, l1.student_name, r.path || l1.student_name || '|'
  FROM rec r
  JOIN lignes l1
  -- avoid loops
  ON r.path NOT LIKE '%|' || l1.student_name || '|%'
  -- there is a common class
  WHERE exists (SELECT * FROM lignes l2 WHERE l2.year = l1.year AND l2.grade = l1.grade AND l2.student_name = r.student_name )
)
-- keep only the shortest path
-- if there are more than one, choose one randomely
-- (this is specific to SQLite)
SELECT student_name, MIN(distance) AS dist, path
FROM rec
GROUP BY student_name
ORDER BY distance

The result is :

student_name    dist  path
Paul Biloux     0     |Paul Biloux|
Lucien Lake     1     |Paul Biloux|Lucien Lake|
Thome Defoe     1     |Paul Biloux|Thome Defoe|
Chane Way       2     |Paul Biloux|Lucien Lake|Chane Way|
Jack Spiral     2     |Paul Biloux|Lucien Lake|Jack Spiral|
Mathilde Seine  3     |Paul Biloux|Lucien Lake|Jack Spiral|Mathilde Seine|
Momo Thary      3     |Paul Biloux|Lucien Lake|Jack Spiral|Momo Thary|

It is quite inefficient, because all the possible path are explored.

User contributions licensed under: CC BY-SA
4 People found this is helpful
Advertisement