I have bunch of tables where several of them have hundreds of columns. I need to get a count of non-null values for each column and I’ve been doing it manually. I would like to figure out a way to get all the counts for all the columns in a table. I looked up stackoverflow and google, but unable to find the answer.
I tried this but it’s just returning a value of 1 for each column. I know it’s just counting the number of column and not the values in each column. Any suggestions?
select count(COLUMN_NAME) from information_schema.columns where table_schema = 'schema_name' and table_name = 'table_name' group by COLUMN_NAME
Advertisement
Answer
COUNT(column_name)
always gives you the count of NON NULL
values.
Create a generic function like this which can take schema name and table name as arguments.
Here I am constructing select statements joined together by UNION ALL
s each returning the value of the column_name and it’s count for all columns when executed dynamically.
CREATE OR REPLACE FUNCTION public.get_count( TEXT, TEXT ) RETURNS TABLE(t_column_name TEXT, t_count BIGINT ) LANGUAGE plpgsql AS $BODY$ DECLARE p_schema TEXT := $1; p_tabname TEXT := $2; v_sql_statement TEXT; BEGIN SELECT STRING_AGG( 'SELECT ''' || column_name || ''',' || ' count(' || column_name || ') FROM ' || table_schema || '.' || table_name ,' UNION ALL ' ) INTO v_sql_statement FROM information_schema.columns WHERE table_schema = p_schema AND table_name = p_tabname; IF v_sql_statement IS NOT NULL THEN RETURN QUERY EXECUTE v_sql_statement; END IF; END $BODY$;
Execution
knayak=# select c.col, c.count from public.get_count( 'public', 'employees' ) as c(col,count); col | count ----------------+------- employee_id | 107 first_name | 107 last_name | 107 email | 107 phone_number | 107 hire_date | 107 job_id | 107 salary | 107 commission_pct | 35 manager_id | 106 department_id | 106 (11 rows)