Skip to content
Advertisement

How to avoid multiple function evals with the (func()).* syntax in a query?

Context

When a function returns a TABLE or a SETOF composite-type, like this one:

CREATE FUNCTION func(n int) returns table(i int, j bigint) as $$
BEGIN
  RETURN QUERY select 1,n::bigint 
      union all select 2,n*n::bigint
      union all select 3,n*n*n::bigint;
END
$$ language plpgsql;

the results can be accessed by various methods:

  1. select * from func(3) will produce these output columns :
 i | j 
---+---
 1 |  3
 2 |  9
 3 | 27
  1. select func(3) will produce only one output column of ROW type.
 func  
-------
 (1,3)
 (2,9)
 (3,27)
  1. select (func(3)).* will produce like #1:
 i | j 
---+---
 1 |  3
 2 |  9
 3 | 27

When the function argument comes from a table or a subquery, the syntax #3 is the only possible one, as in:

select N, (func(N)).*
from (select 2 as N union select 3 as N) s;

or as in this related answer. If we had LATERAL JOIN we could use that, but until PostgreSQL 9.3 is out, it’s not supported, and the previous versions will still be used for years anyway.

Problem

The problem with syntax #3 is that the function is called as many times as there are columns in the result. There’s no apparent reason for that, but it happens.
We can see it in version 9.2 by adding a RAISE NOTICE 'called for %', n in the function. With the query above, it outputs:

NOTICE:  called for 2
NOTICE:  called for 2
NOTICE:  called for 3
NOTICE:  called for 3

Now, if the function is changed to return 4 columns, like this:

CREATE FUNCTION func(n int) returns table(i int, j bigint,k int, l int) as $$
BEGIN
  raise notice 'called for %', n;
  RETURN QUERY select 1,n::bigint,1,1 
      union all select 2,n*n::bigint,1,1
      union all select 3,n*n*n::bigint,1,1;
END                                        
$$ language plpgsql stable;

then the same query outputs:

NOTICE:  called for 2
NOTICE:  called for 2
NOTICE:  called for 2
NOTICE:  called for 2
NOTICE:  called for 3
NOTICE:  called for 3
NOTICE:  called for 3
NOTICE:  called for 3

2 function calls were needed, 8 were actually made. The ratio is the number of output columns.

With syntax #2 that produces the same result except for the output columns layout, these multiple calls don’t happen:

select N, func(N)
from (select 2 as N union select 3 as N) s;

gives:

NOTICE:  called for 2
NOTICE:  called for 3

followed by the 6 resulting rows:

 n |    func    
---+------------
 2 | (1,2,1,1)
 2 | (2,4,1,1)
 2 | (3,8,1,1)
 3 | (1,3,1,1)
 3 | (2,9,1,1)
 3 | (3,27,1,1)

Questions

Is there a syntax or a construct with 9.2 that would achieve the expected result by doing only the minimum required function calls?

Bonus question: why do the multiple evaluations happen at all?

Advertisement

Answer

You can wrap it up in a subquery but that’s not guaranteed safe without the OFFSET 0 hack. In 9.3, use LATERAL. The problem is caused by the parser effectively macro-expanding * into a column list.

Workaround

Where:

SELECT (my_func(x)).* FROM some_table;

will evaluate my_func n times for n result columns from the function, this formulation:

SELECT (mf).* FROM (
    SELECT my_func(x) AS mf FROM some_table
) sub;

generally will not, and tends not to add an additional scan at runtime. To guarantee that multiple evaluation won’t be performed you can use the OFFSET 0 hack or abuse PostgreSQL’s failure to optimise across CTE boundaries:

SELECT (mf).* FROM (
    SELECT my_func(x) AS mf FROM some_table OFFSET 0
) sub;

or:

WITH tmp(mf) AS (
    SELECT my_func(x) FROM some_table
)
SELECT (mf).* FROM tmp;

In PostgreSQL 9.3 you can use LATERAL to get a saner behaviour:

SELECT mf.*
FROM some_table
LEFT JOIN LATERAL my_func(some_table.x) AS mf ON true;

LEFT JOIN LATERAL ... ON true retains all rows like the original query, even if the function call returns no row.

Demo

Create a function that isn’t inlineable as a demonstration:

CREATE OR REPLACE FUNCTION my_func(integer)
RETURNS TABLE(a integer, b integer, c integer) AS $$
BEGIN
    RAISE NOTICE 'my_func(%)',$1;
    RETURN QUERY SELECT $1, $1, $1;
END;
$$ LANGUAGE plpgsql;

and a table of dummy data:

CREATE TABLE some_table AS SELECT x FROM generate_series(1,10) x;

then try the above versions. You’ll see that the first raises three notices per invocation; the latter only raise one.

Why?

Good question. It’s horrible.

It looks like:

(func(x)).*

is expanded as:

(my_func(x)).i, (func(x)).j, (func(x)).k, (func(x)).l

in parsing, according to a look at debug_print_parse, debug_print_rewritten and debug_print_plan. The (trimmed) parse tree looks like this:

   :targetList (
      {TARGETENTRY 
      :expr 
         {FIELDSELECT 
         :arg 
            {FUNCEXPR 
            :funcid 57168 
                 ...
            }
         :fieldnum 1 
         :resulttype 23 
         :resulttypmod -1 
         :resultcollid 0
         }
      :resno 1 
      :resname i 
       ...
      }
      {TARGETENTRY 
      :expr 
         {FIELDSELECT 
         :arg 
            {FUNCEXPR 
            :funcid 57168 
                 ...
            }
         :fieldnum 2 
         :resulttype 20 
         :resulttypmod -1 
         :resultcollid 0
         }
      :resno 2 
      :resname j 
       ...
      }
      {TARGETENTRY 
      :expr 
         {FIELDSELECT 
         :arg 
            {FUNCEXPR 
            :funcid 57168 
             ...
            }
         :fieldnum 3 
         :...
         }
      :resno 3 
      :resname k 
       ...
      }
      {TARGETENTRY 
      :expr 
         {FIELDSELECT 
         :arg 
            {FUNCEXPR 
            :funcid 57168 
             ...
            }
         :fieldnum 4 
          ...
         }
      :resno 4 
      :resname l 
       ...
      }
   )

So basically, we’re using a dumb parser hack to expand wildcards by cloning nodes.

User contributions licensed under: CC BY-SA
4 People found this is helpful
Advertisement