Skip to content
Advertisement

Casting NULL type when updating multiple rows

I have a problem when I try to update many rows at the same time.

Here is the table and query I use (simplified for better reading):

table

query

This query works perfectly, but when I try to execute a query where all x or y values are null, I get an error:

query with nulls

error

The only way to fix that is to change at least one of the values (null, 20, 1) to (null:int, 50, 2) but I can’t do that, since I have a function which generates these “update multiple rows” query and it doesn’t know anything about the column types.

What’s the best solution here? Is there any better update query for multiple rows? Is there any function or syntax like AS t(x:gettype(foo.x), y:gettype(foo.y), pkid:gettype(foo.pkid))?

Advertisement

Answer

With a standalone VALUES expression PostgreSQL has no idea what the data types should be. With simple numeric literals the system is happy to assume matching types. But with other input (like NULL) you would need to cast explicitly – as you already have found out.

You can query pg_catalog (fast, but PostgreSQL-specific) or the information_schema (slow, but standard SQL) to find out and prepare your statement with appropriate types.

Or you can use one of these simple “tricks” (I saved the best for last):

0. Select row with LIMIT 0, append rows with UNION ALL VALUES

The first sub-select of the subquery:

gets names and types for the columns, but LIMIT 0 prevents it from adding an actual row. Subsequent rows are coerced to the now well-defined row type – and checked immediately whether they match the type. Should be a subtle additional improvement over your original form.

While providing values for all columns of the table this short syntax can be used for the first row:

Major limitation: Postgres casts the input literals of the free-standing VALUES expression to a “best-effort” type immediately. When it later tries to cast to the given types of the first SELECT, it may already be too late for some types if there is no registered assignment cast between the assumed type and the target type. Examples: text -> timestamp or text -> json.

Pro:

  • Minimum overhead.
  • Readable, simple and fast.
  • You only need to know relevant column names of the table.

Con:

  • Type resolution can fail for some types.

1. Select row with LIMIT 0, append rows with UNION ALL SELECT

Pro:

  • Like 0., but avoids failing type resolution.

Con:

  • UNION ALL SELECT is slower than VALUES expression for long lists of rows, as you found in your test.
  • Verbose syntax per row.

2. VALUES expression with per-column type

Contrary to 0. this avoids premature type resolution.

The first row in the VALUES expression is a row of NULL values which defines the type for all subsequent rows. This leading noise row is filtered by WHERE f.pkid = t.pkid later, so it never sees the light of day. For other purposes you can eliminate the added first row with OFFSET 1 in a subquery.

Pro:

  • Typically faster than 1. (or even 0.)
  • Short syntax for tables with many columns and only few are relevant.
  • You only need to know relevant column names of the table.

Con:

  • Verbose syntax for only few rows
  • Less readable (IMO).

3. VALUES expression with row type

You obviously know the table name. If you also know the number of columns and their order you can work with this.

For every table in PostgreSQL a row type is registered automatically. If you match the number of columns in your expression, you can cast to the row type of the table ('(1,50,)'::foo) thereby assigning column types implicitly. Put nothing behind a comma to enter a NULL value. Add a comma for every irrelevant trailing column.
In the next step you can access individual columns with the demonstrated syntax. More about Field Selection in the manual.

Or you could add a row of NULL values and use uniform syntax for actual data:

Pro:

  • Fastest (at least in my tests with few rows and columns).
  • Shortest syntax for few rows or tables where you need all columns.
  • You don’t have to spell out columns of the table – all columns automatically have the matching name.

Con:

  • Not so well known syntax for field selection from record / row / composite type.
  • You need to know number and position of relevant columns in default order.

4. VALUES expression with decomposed row type

Like 3., but with decomposed rows in standard syntax:

Or, with a leading row of NULL values again:

Pros and cons like 3., but with more commonly known syntax.
And you need to spell out column names (if you need them).

5. VALUES expression with types fetched from row type

Like Unril commented, we can combine the virtues of 2. and 4. to provide only a subset of columns:

Pros and cons like 4., but we can work with any subset of columns and don’t have to know the full list.

Also displaying short syntax for the UPDATE itself that’s convenient for cases with many columns. Related:

4. and 5. are my favorites.

db<>fiddle here – demonstrating all

User contributions licensed under: CC BY-SA
2 People found this is helpful
Advertisement