Skip to content
Advertisement

How can I efficiently query for all rows in a database matching a filter based on a joined table?

Apologies if the question title is unclear; I’m not sure how to phrase this without more detail. I have a table foos like:

id (bigint) | details (json)
1           | {someKey: 'someValue' ...}
2           | {someKey: 'otherValue' ...}
...

I also have a table foo_labels like:

foo_id (bigint) | label_id (bigint)
1               | 10
2               | 13
...

I’m interested in retrieving all the foo’s matching a certain label filter. For example, if I want to fetch all the foo’s with the label 5 OR 6, I can do:

select f.* from foos f 
join foo_labels fl on f.id = fl.foo_id
where fl.label_id in (5, 6)
group by f.id

Similarly, I can fetch all foos with labels 5 AND 6 with something like:

select f.* from foos f
join foo_labels fl on f.id = fl.foo_id 
where fl.label_id in (5, 6)
group by f.id
having count(fl.label_id) = 2;

But I’m getting stuck with more complex queries. For example, how would I do a query to get all the foos that have (label 1 OR label 2) AND (label 3 OR label 4). More abstractly, I’d like to perform a query to fetch all the foo’s having labels matching a group of OR clauses, AND’ed together, like:

(l_{11} OR l_{12} OR …) AND (l_{21} OR l_{22} OR …) AND …

I’ve tried playing around with it but can’t find a solution that involves just one join of the foo_labels table; right now the only way I can get it to work is to perform one join per OR clause. Is there a way to make a query like this while only joining the foo_labels table once? I’m using MySql but if you know how to do this in a similar version of SQL that might also be helpful?

Advertisement

Answer

how would I do a query to get all the foos that have (label 1 OR label 2) AND (label 3 OR label 4)?

You can have conditional expressions in the having clause, like:

select f.* 
from foos f
join foo_labels fl on f.id = fl.foo_id 
where fl.label_id in (1, 2, 3, 4)
group by f.id
having 
    max(fl.label_id in (1, 2)) = 1
    and max(fl.label_id in (3, 4)) = 1
;

The where clause is not strictly necessary, but it makes the query more efficient, by restricting the number of rows before aggregation.

User contributions licensed under: CC BY-SA
8 People found this is helpful
Advertisement