My SQL table has multiple columns, among them A and B. In A there’s some sort of identifier of some arbitrary object (say, a user or an item). Each object instance identified by A can have multiple entries in the database (A is not unique). Each object furthermore has a property B that is also stored in the database in the corresponding column B. That property can be NULL.
I now want to find all objects that have exclusively rows in the database with non-null values for B. So if there’s an object A=12345 which has at least one row with B=NULL, it should be excluded. I also want to group/squash/combine the results, so that I only get a list of A’s with the given condition.
For a simple and unoptimized afoot Python solution I could go on and solve it like this:
from typing import Optional, Any from dataclasses import dataclass @dataclass class Obj: A: int = 0 B: Optional[Any] = None Obj(0, "Test")
Obj(A=0, B='Test')
objects = [ Obj(0, "Test1"), Obj(0, "Test2"), Obj(0, "Test3"), Obj(0, "Test4"), Obj(1, "Test1"), Obj(1, "Test2"), Obj(1, "Test3"), Obj(1, None), Obj(2, "Test1"), Obj(2, "Test2"), Obj(2, "Test3"), Obj(2, "Test4"), ]
object_As = set([o.A for o in objects]) result = [] for obj_A in object_As: objects_Bs_by_A = [bool(o.B is not None) for o in objects if o.A == obj_A] if all(objects_Bs_by_A): result.append(obj_A)
result
[0, 2]
So, the question is: How would a proper SQL query look like for this purpose?
Advertisement
Answer
You can use aggregation:
select a from t group by a having count(*) = count(b);
count(b)
counts the number of non-NULL values in b
. This simply says that this is equal to all the rows for each a
.
If you want the original rows, you can use not exists
:
select t.* from t where not exists (select 1 from t t2 where t2.a = t.a and t2.b is null );
For performance, you want an index on t(a, b)
.