I have a ecosystems
table, which is self-referencing because each ecosystem can have sub-ecosystems. Each ecosystem can also have a score (which represents how healthy the ecosystem is). The columns are (with example data):
| slug | parent_slug | full_slug | score | | ---- | ----------- | ----------- | ----- | | aaa | null | aaa | 1 | | bbb | aaa | aaa/bbb | 2 | | ccc | bbb | aaa/bbb/ccc | 4 | | ddd | null | ddd | 8 | | eee | ddd | ddd/eee | 16 | | fff | null | fff | 32 |
The full_slug
column represents the full path from top-level ecosystem down. It is redundant as it can be deduced from the slug
and parent_slug
columns, but it’s there.
What I’m trying to achieve is to create a query with the same number of lines, but with a column total_score
which calculate each ecosystem’s score PLUS all its children ecosystems’ score, recursively. Namely, the output should be:
| slug | total_score | | ---- | ----------- | | aaa | 7 | | bbb | 6 | | ccc | 4 | | ddd | 24 | | eee | 16 | | fff | 32 |
I started the following query:
WITH top AS ( SELECT SUM(e.score) as total_score, CASE instr(e.full_slug, '/') WHEN 0 THEN e.full_slug ELSE substr(e.full_slug, 0, instr(e.full_slug, '/')) END AS top_level_eco FROM ecosystems e GROUP BY top_level_eco ) SELECT e.slug, top.total_score FROM ecosystems e INNER JOIN top on top.top_level_eco = e.slug;
But unfortunately it only shows top-level ecosystems and their total score.
Advertisement
Answer
I can think of several answers…
The general answer is to use recursion…
WITH tree AS ( SELECT slug AS base_slug, slug AS current_slug, score AS score FROM ecosystems UNION ALL SELECT t.base_slug, e.slug, e.score FROM tree t INNER JOIN ecosystems e ON e.parent_slug = t.current_slug ) SELECT base_slug AS slug, SUM(score) AS total_score FROM tree GROUP BY base_slug ORDER BY base_slug
Another option is to make use of the full_slug
in a JOIN
, though that will prohibit use of indexes and can often be significantly slower than the general solution above.
SELECT e.slug, SUM(m.score) AS total_score FROM ecosystems e INNER JOIN ecosystems m -- members ON '/' || m.full_slug || '/' LIKE '%/' || e.slug || '/%' GROUP BY e.slug ORDER BY e.slug
A third way would be to unnest
/explode
the full_slug
(that is to create one row for each component of the full_slug
), and then group by the components. SQLite doesn’t have that functionality natively, so that too probably gets solved with recursion.
WITH tree AS ( SELECT SUBSTR(full_slug || '/', 1, INSTR(full_slug || '/', '/')-1) AS slug, SUBSTR(full_slug || '/', INSTR(full_slug || '/', '/')+1) AS path, score FROM ecosystems UNION ALL SELECT SUBSTR(path, 1, INSTR(path, '/')-1) AS slug, SUBSTR(path, INSTR(path, '/')+1) AS path, score FROM tree WHERE tree.path <> '' ) SELECT slug, SUM(score) AS total_score FROM tree GROUP BY slug ORDER BY slug
Demo of all three approaches: