Skip to content
Advertisement

In SQL get all rows with same value in column A that have only non-null values in column B

My SQL table has multiple columns, among them A and B. In A there’s some sort of identifier of some arbitrary object (say, a user or an item). Each object instance identified by A can have multiple entries in the database (A is not unique). Each object furthermore has a property B that is also stored in the database in the corresponding column B. That property can be NULL.

I now want to find all objects that have exclusively rows in the database with non-null values for B. So if there’s an object A=12345 which has at least one row with B=NULL, it should be excluded. I also want to group/squash/combine the results, so that I only get a list of A’s with the given condition.

For a simple and unoptimized afoot Python solution I could go on and solve it like this:

from typing import Optional, Any
from dataclasses import dataclass

@dataclass
class Obj:
    A: int = 0
    B: Optional[Any] = None

Obj(0, "Test")
Obj(A=0, B='Test')
objects = [
    Obj(0, "Test1"),
    Obj(0, "Test2"),
    Obj(0, "Test3"),
    Obj(0, "Test4"),
    Obj(1, "Test1"),
    Obj(1, "Test2"),
    Obj(1, "Test3"),
    Obj(1, None),
    Obj(2, "Test1"),
    Obj(2, "Test2"),
    Obj(2, "Test3"),
    Obj(2, "Test4"),
]
object_As = set([o.A for o in objects])
result = []
for obj_A in object_As:
    objects_Bs_by_A = [bool(o.B is not None) for o in objects if o.A == obj_A]
    if all(objects_Bs_by_A):
        result.append(obj_A)
result
[0, 2]

So, the question is: How would a proper SQL query look like for this purpose?

Advertisement

Answer

You can use aggregation:

select a
from t
group by a
having count(*) = count(b);

count(b) counts the number of non-NULL values in b. This simply says that this is equal to all the rows for each a.

If you want the original rows, you can use not exists:

select t.*
from t
where not exists (select 1
                  from t t2
                  where t2.a = t.a and t2.b is null
                 );

For performance, you want an index on t(a, b).

User contributions licensed under: CC BY-SA
9 People found this is helpful
Advertisement