In my database are many text columns where values are empty strings (''
). The empty strings need to be set to NULL
. I do not know the exact schemas, tables and columns in this database or rather I want to write a general solution which can be reused.
How would I write a query / function to find all text columns in all tables in all schemas and update all columns with empty strings (''
) to NULL
?
Advertisement
Answer
The most efficient way to achieve this:
- Run a single
UPDATE
per table. - Only update nullable columns (not defined
NOT NULL
) with any actual empty string. - Only update rows with any actual empty string.
- Leave other values unchanged.
This related answer has a plpgsql function that builds and runs the UPDATE
command using system catalog pg_attribute
automatically and safely for any given table:
Using the function f_empty2null()
from this answer, you can loop through selected tables like this:
DO $do$ DECLARE _tbl regclass; BEGIN FOR _tbl IN SELECT c.oid::regclass FROM pg_class c JOIN pg_namespace n ON n.oid = c.relnamespace WHERE c.relkind = 'r' -- only regular tables AND n.nspname NOT LIKE 'pg_%' -- exclude system schemas LOOP RAISE NOTICE $$PERFORM f_empty2null('%');$$, _tbl; -- PERFORM f_empty2null(_tbl); -- uncomment to prime the bomb END LOOP; END $do$;
Careful! This updates all empty strings in all columns of all user tables in the DB. Be sure that’s what you want or it might nuke your database.
You need UPDATE
privileges on all selected tables, of course.
As a child safety device I commented the payload.
You may have noted that I use the system catalogs directly, not the information schema (which would work, too). About this:
- How to check if a table exists in a given schema
- Query to return output column names and data types of a query, table or view
For repeated use
Here is an integrated solution for repeated use. Without safety devices:
CREATE OR REPLACE FUNCTION f_all_empty2null(OUT _tables int, OUT _rows int) AS $func$ DECLARE _typ CONSTANT regtype[] := '{text, bpchar, varchar, "char"}'; _sql text; _row_ct int; BEGIN _tables := 0; _rows := 0; FOR _sql IN SELECT format('UPDATE %s SET %s WHERE %s' , t.tbl , string_agg(format($$%1$s = NULLIF(%1$s, '')$$, t.col), ', ') , string_agg(t.col || $$ = ''$$, ' OR ')) FROM ( SELECT c.oid::regclass AS tbl, quote_ident(attname) AS col FROM pg_namespace n JOIN pg_class c ON c.relnamespace = n.oid JOIN pg_attribute a ON a.attrelid = c.oid WHERE n.nspname NOT LIKE 'pg_%' -- exclude system schemas AND c.relkind = 'r' -- only regular tables AND a.attnum >= 1 -- exclude tableoid & friends AND NOT a.attisdropped -- exclude dropped columns AND NOT a.attnotnull -- exclude columns defined NOT NULL! AND a.atttypid = ANY(_typ) -- only character types ORDER BY a.attnum ) t GROUP BY t.tbl LOOP EXECUTE _sql; GET DIAGNOSTICS _row_ct = ROW_COUNT; -- report nr. of affected rows _tables := _tables + 1; _rows := _rows + _row_ct; END LOOP; END $func$ LANGUAGE plpgsql;
Call:
SELECT * FROM pg_temp.f_all_empty2null();
Returns:
_tables | _rows ---------+--------- 23 | 123456
Note how I escaped both table and columns names properly!
c.oid::regclass AS tbl, quote_ident(attname) AS col
Consider:
Careful! Same warning as above.
Also consider the basic explanation in the answer I linked above: