Skip to content
Advertisement

how to rewrite query to put data-modifying CTE at top level

I’m trying to write a query for Postgres to insert a row into one table which as a side-effect may insert a row into a second table (due to the first table having a foreign key constraint on the second).

I suppose I could do this as two separate queries, possibly inside a transaction, but now that I’ve started down the path of a single query, I’d like to know how it might be done, if it’s possible at all.

I’ve been trying to use the general structure laid out in How to use RETURNING with ON CONFLICT in PostgreSQL? and Insert if names do not exist and return ids if exist, which seemed like a reasonable starting point, but I’m getting the error WITH clause containing a data-modifying statement must be at the top level, and I’m not sure how to follow that advice.

Table a is something like this:

CREATE TABLE a (
    a_uuid uuid PRIMARY KEY,
    b_uuid uuid REFERENCES b(b_uuid) 
)

and table b is:

CREATE TABLE b (
    b_uuid uuid PRIMARY KEY,
    b_name text UNIQUE
)

I want to insert a row into a where b_uuid is computed based on input that’ll match b_name. If b already has the row, then use the corresponding b_uuid; otherwise, create a new row in b with that text and a newly-generated UUID, returning the latter to use as part of the insert into a. This was what I had written before I realized I was in over my head:

WITH new_b AS (
        WITH input (b_uuid, b_name) AS (
                VALUES (
                        gen_random_uuid(), $1
                )
        ), ins AS (
                INSERT INTO b (
                        b_uuid, b_name
                )
                TABLE input
                ON CONFLICT DO NOTHING
                RETURNING b_uuid
        )
        TABLE ins
        UNION ALL
        SELECT b_uuid FROM input
)
INSERT INTO a (a_uuid, b_uuid)
VALUES (
        gen_random_uuid(), (SELECT b_uuid FROM new_b)
)

Am I anywhere close? What would be the best way of doing this?

Advertisement

Answer

demo:db<>fiddle (because of random things, you may reload several times if the random uuid equals 1; instead of type uuid I used int because the fiddle engine currently does not support the pgcrypto extension. I simulated the function with an own one.)

WITH input (b_uuid, b_name) AS (
    VALUES (
        gen_random_uuid(), $1
    )
), ins_b AS (
    INSERT INTO b (
        b_uuid, b_name
    )
    TABLE input
    ON CONFLICT DO NOTHING
    RETURNING b_uuid
), new_b AS (
    TABLE ins_b
    UNION ALL
    SELECT b.b_uuid FROM input
    JOIN b USING (b_uuid)
)
INSERT INTO a (a_uuid, b_uuid)
VALUES (
    gen_random_uuid(), (SELECT b_uuid FROM new_b)
);

Your solution is not far away:

  1. Manipulating statements (like INSERT) cannot be inside nested WITH clauses (At this point: Thank you, I didn’t even know about this nested CTE feature :D)
  2. The main point of @ErwinBrandstetter’s incredible fantastic solution (How to use RETURNING with ON CONFLICT in PostgreSQL?) is the JOIN that your solution is missing: The new_b part works as follows: If there is a conflict, ins_b returns nothing. So, TABLE ins_b is empty. In that case the already existing b_uuid needs to be called directly from TABLE b. Taking the generated UUID, joining it against b gives out the existing b_uuid (and if you wish, every other column of this record). But if there is no conflict – the b_uuid does not exist yet -, then ins_b returns the new data set, TABLE ins_b is not empty, but the join on the original table fails because there is still no record persisted which can be used to join. This, of course, works for more than one record to be inserted.
User contributions licensed under: CC BY-SA
5 People found this is helpful
Advertisement