I am developing a data validation framework where I have this requirement of checking that the table fields should have at least one non-null value i.e they shouldn’t be completely empty having all values as null.
For a particular column, I can easily check using
select count(distinct column_name) from table_name;
If it’s greater than 0 I can tell that the column is not empty. I already have a list of columns. So, I can execute this query in the loop for every column but this would mean a lot of requests and it is not the ideal way.
What is the better way of doing this? I am using Microsoft SQL Server.
Advertisement
Answer
I would not recommend using count(distinct)
because it incurs overhead for removing duplicate values. You can just use count()
.
You can construct the query for counts using a query like this:
select count(col1) as col1_cnt, count(col2) as col2_cnt, . . . from t;
If you have a list of columns you can do this as dynamic SQL. Something like this:
declare @sql nvarchar(max); select @sql = concat('select ', string_agg(concat('count(', quotename(s.value), ') as cnt_', s.value), ' from t' ) from string_split(@list) s; exec sp_executesql(@sql);
This might not quite work if your columns have special characters in them, but it illustrates the idea.