My SQL table has multiple columns, among them A and B. In A there’s some sort of identifier of some arbitrary object (say, a user or an item). Each object instance identified by A can have multiple entries in the database (A is not unique). Each object furthermore has a property B that is also stored in the database in the corresponding column B. That property can be NULL.
I now want to find all objects that have exclusively rows in the database with non-null values for B. So if there’s an object A=12345 which has at least one row with B=NULL, it should be excluded. I also want to group/squash/combine the results, so that I only get a list of A’s with the given condition.
For a simple and unoptimized afoot Python solution I could go on and solve it like this:
from typing import Optional, Any
from dataclasses import dataclass
@dataclass
class Obj:
A: int = 0
B: Optional[Any] = None
Obj(0, "Test")
Obj(A=0, B='Test')
objects = [
Obj(0, "Test1"),
Obj(0, "Test2"),
Obj(0, "Test3"),
Obj(0, "Test4"),
Obj(1, "Test1"),
Obj(1, "Test2"),
Obj(1, "Test3"),
Obj(1, None),
Obj(2, "Test1"),
Obj(2, "Test2"),
Obj(2, "Test3"),
Obj(2, "Test4"),
]
object_As = set([o.A for o in objects])
result = []
for obj_A in object_As:
objects_Bs_by_A = [bool(o.B is not None) for o in objects if o.A == obj_A]
if all(objects_Bs_by_A):
result.append(obj_A)
result
[0, 2]
So, the question is: How would a proper SQL query look like for this purpose?
Advertisement
Answer
You can use aggregation:
select a
from t
group by a
having count(*) = count(b);
count(b)
counts the number of non-NULL values in b
. This simply says that this is equal to all the rows for each a
.
If you want the original rows, you can use not exists
:
select t.*
from t
where not exists (select 1
from t t2
where t2.a = t.a and t2.b is null
);
For performance, you want an index on t(a, b)
.