Here is a sample of my database with 3 colums : class, year and student_name.
I search the “distance” in terms of common student between two given student. (Like in the famous game “6 degrees of Bacon”)
The goal is to calculate the distance between Momo Thary (2018) and Paul Biloux (2020). And the answer is 2 students : Momo Thary was in the same class than Jack Spiral. Jack Spiral was in the same class than Lucien Lake. And Lucien Lake was in the same class than Paul Biloux.
So we can “link” Momo Thary and Paul Biloux with 2 students.
I am using SQLite online. With the following code, it is possible to know the common player of Jack Spiral and Paul Biloux :
WITH names(name) AS (SELECT 'Jack Spiral' UNION ALL SELECT 'Paul Biloux'), common AS ( SELECT class, year FROM Lignes WHERE student_name IN names GROUP BY class, year HAVING COUNT(DISTINCT student_name) = (SELECT COUNT(*) FROM names) ) SELECT l1.student_name student FROM Lignes l1 INNER JOIN Lignes l2 ON l2.year = l1.year AND l2.class = l1.class AND l2.student_name <> l1.student_name WHERE l2.student_name IN names GROUP BY student HAVING COUNT(DISTINCT l2.student_name) = (SELECT COUNT(*) FROM names) AND SUM((l1.class, l1.year) IN common) = 0
The output is Lucien Lake.
But if I use this script for Momo Thary and Paul Biloux, there is no output. In fact, is it required to use Jack Spiral and then Lucien Lake to make the link.
Is it possible to adapt the script, to first know the quickest link (if there are many possible link), and then to know what is the path in term of student ? May SQL be suited to do this type of code or another language would be more appropriate ?
Advertisement
Answer
You can use a recursive query :
WITH RECURSIVE rec(distance, student_name, path) AS ( -- start with Paul Biloux SELECT 0, 'Paul Biloux', '|Paul Biloux|' UNION -- increment distance SELECT r.distance + 1, l1.student_name, r.path || l1.student_name || '|' FROM rec r JOIN lignes l1 -- avoid loops ON r.path NOT LIKE '%|' || l1.student_name || '|%' -- there is a common class WHERE exists (SELECT * FROM lignes l2 WHERE l2.year = l1.year AND l2.grade = l1.grade AND l2.student_name = r.student_name ) ) -- keep only the shortest path -- if there are more than one, choose one randomely -- (this is specific to SQLite) SELECT student_name, MIN(distance) AS dist, path FROM rec GROUP BY student_name ORDER BY distance
The result is :
student_name dist path Paul Biloux 0 |Paul Biloux| Lucien Lake 1 |Paul Biloux|Lucien Lake| Thome Defoe 1 |Paul Biloux|Thome Defoe| Chane Way 2 |Paul Biloux|Lucien Lake|Chane Way| Jack Spiral 2 |Paul Biloux|Lucien Lake|Jack Spiral| Mathilde Seine 3 |Paul Biloux|Lucien Lake|Jack Spiral|Mathilde Seine| Momo Thary 3 |Paul Biloux|Lucien Lake|Jack Spiral|Momo Thary|
It is quite inefficient, because all the possible path are explored.