Skip to content
Advertisement

Optimize SQLite Query using GROUP BY

I am trying to optimize my query which uses GROUP BY. But can’t it takes too long time.

I have a virtual table which contains 3 columns:

CREATE VIRTUAL TABLE GoiWordsWithSeq
USING FTS5(idseq,sid,body);

I want the group to be just as the query below, but it just takes a bit too long to run. For example I tried this first which works pretty good but still I wonder if I can make it faster:

select 
GROUP_CONCAT(body), sid
from GoiWordsWithSeq
GROUP BY sid
HAVING sid IN (
select sid from GoiWordsWithSeq
where GoiWordsWithSeq.body match 'awesome'
)

Then:

  select 
    GROUP_CONCAT(body), GoiWordsWithSeq.sid
    from GoiWordsWithSeq
    inner join  (
    select sid from GoiWordsWithSeq
    where GoiWordsWithSeq.body match 'awesome'
    ) GoiMatch ON GoiMatch.sid = GoiWordsWithSeq.sid 

    GROUP BY GoiWordsWithSeq.sid

But it still takes long time to query. I’m trying to use JOIN because I heard it is usually faster but can’t really succed. Any advice would help!

Advertisement

Answer

Why did you choose to set the condition in the HAVING clause?
With your code the condition is checked after all the aggregations are finished.
Your condition does not involve any aggregated column so there is no reason to use it in the HAVING clause.
It would make more sense to set it in a WHERE clause so the dataset is filtered and then GROUP BY is applied for aggregation:

SELECT GROUP_CONCAT(body), sid
FROM GoiWordsWithSeq
WHERE sid IN (
  SELECT sid FROM GoiWordsWithSeq
  WHERE body MATCH 'awesome'
)
GROUP BY sid

You could also try EXISTS which sometimes performs better:

SELECT GROUP_CONCAT(g.body), g.sid
FROM GoiWordsWithSeq g
WHERE EXISTS (
  SELECT 1 FROM GoiWordsWithSeq
  WHERE sid = g.sid AND body MATCH 'awesome'
)
GROUP BY g.sid
User contributions licensed under: CC BY-SA
7 People found this is helpful
Advertisement