Joins of data-modifying CTE with the target table

Started by Alex Bolenokalmost 3 years ago6 messagesgeneral
Jump to latest
#1Alex Bolenok
quassnoi@gmail.com

Hi list,

This popped up yesterday during a discussion at the Boston PostgreSQL group
meetup, and Jesper Pedersen had advised that I post it here.

Imagine this setup:

CREATE TABLE IF NOT EXISTS mytable (id BIGSERIAL PRIMARY KEY, value TEXT
NOT NULL);

WITH insert_cte AS
(
INSERT
INTO mytable (value)
VALUES ('test')
RETURNING
*
)
SELECT mytable.*
FROM insert_cte
JOIN mytable
USING (id);

This query will return nothing, even though people would expect it to
return the newly inserted record.

This is just a minimally reproducible example, in which you can easily work
around the problem just by getting rid of the join to mytable. But during
my consulting career, I've seen people try putting together more complex
queries using the same pattern, and this always comes as a surprise.

I get why it's not working (because the statement is not allowed to see the
tuples with its own cmin), but I was wondering if it was worth it at least
to spell it out explicitly in the documentation.

Right now the documentation says:

https://www.postgresql.org/docs/15/queries-with.html#QUERIES-WITH-MODIFYING

RETURNING data is the only way to communicate changes between different

WITH sub-statements and the main query

which I don't think is covering the JOIN issue (after all, I am using the
RETURNING clause to communicate with the main query).

Can we please add this example to the documentation? I can do the wording
if that's something worth adding.

Thank you!

#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: Alex Bolenok (#1)
Re: Joins of data-modifying CTE with the target table

Alex Bolenok <quassnoi@gmail.com> writes:

I get why it's not working (because the statement is not allowed to see the
tuples with its own cmin), but I was wondering if it was worth it at least
to spell it out explicitly in the documentation.

What's not explicit about this?

The sub-statements in WITH are executed concurrently with each other
and with the main query. Therefore, when using data-modifying
statements in WITH, the order in which the specified updates actually
happen is unpredictable. All the statements are executed with the same
snapshot (see Chapter 13), so they cannot “see” one another's effects
on the target tables.

regards, tom lane

#3Adrian Klaver
adrian.klaver@aklaver.com
In reply to: Alex Bolenok (#1)
Re: Joins of data-modifying CTE with the target table

On 4/19/23 08:37, Alex Bolenok wrote:

Hi list,

This popped up yesterday during a discussion at the Boston PostgreSQL
group meetup, and Jesper Pedersen had advised that I post it here.

Imagine this setup:

CREATE TABLE IF NOT EXISTS mytable (id BIGSERIAL PRIMARY KEY, value TEXT
NOT NULL);

WITH    insert_cte AS
        (
        INSERT
        INTO    mytable (value)
        VALUES  ('test')
        RETURNING
                *
        )
SELECT  mytable.*
FROM    insert_cte
JOIN    mytable
USING   (id);

This query will return nothing, even though people would expect it to
return the newly inserted record.

This is just a minimally reproducible example, in which you can easily
work around the problem just by getting rid of the join to mytable. But
during my consulting career, I've seen people try putting together more
complex queries using the same pattern, and this always comes as a surprise.

I get why it's not working (because the statement is not allowed to see
the tuples with its own cmin), but I was wondering if it was worth it at
least to spell it out explicitly in the documentation.

Right now the documentation says:

https://www.postgresql.org/docs/15/queries-with.html#QUERIES-WITH-MODIFYING <https://www.postgresql.org/docs/15/queries-with.html#QUERIES-WITH-MODIFYING&gt;

RETURNING data is the only way to communicate changes between
different WITH sub-statements and the main query

which I don't think is covering the JOIN issue (after all, I am using
the RETURNING clause to communicate with the main query).

Can we please add this example to the documentation? I can do the
wording if that's something worth adding.

To add to Tom's post.

"
Data-modifying statements in WITH usually have RETURNING clauses (see
Section 6.4), as shown in the example above. It is the output of the
RETURNING clause, not the target table of the data-modifying statement,
that forms the temporary table that can be referred to by the rest of
the query. If a data-modifying statement in WITH lacks a RETURNING
clause, then it forms no temporary table and cannot be referred to in
the rest of the query. Such a statement will be executed nonetheless. A
not-particularly-useful example is:

...

The sub-statements in WITH are executed concurrently with each other and
with the main query. Therefore, when using data-modifying statements in
WITH, the order in which the specified updates actually happen is
unpredictable. All the statements are executed with the same snapshot
(see Chapter 13), so they cannot “see” one another's effects on the
target tables. This alleviates the effects of the unpredictability of
the actual order of row updates, and means that RETURNING data is the
only way to communicate changes between different WITH sub-statements
and the main query. An example of this is that in

WITH t AS (
UPDATE products SET price = price * 1.05
RETURNING *
)
SELECT * FROM products;

the outer SELECT would return the original prices before the action of
the UPDATE, while in

WITH t AS (
UPDATE products SET price = price * 1.05
RETURNING *
)
SELECT * FROM t;

the outer SELECT would return the updated data.
"

So the RETURNING temp table is the only thing you have to work on.

Thank you!

--
Adrian Klaver
adrian.klaver@aklaver.com

#4Alex Bolenok
quassnoi@gmail.com
In reply to: Tom Lane (#2)
Re: Joins of data-modifying CTE with the target table

To reiterate, I (I personally) get why it's not working, but the
documentation is something that is intended for people who don't.

Technically, anyone can deduce it from reading the documentation thoroughly
enough, but do we need the documentation to be as terse as possible?

To answer your question, by "explicitly" I mean mentioning this very
pattern, where you join the returned id with the target table. People often
try to use this pattern for queries like "add an item to the order in a
CTE, select the order total in the main query", and often don't notice that
the order total doesn't include the new item until it hits production.

ср, 19 апр. 2023 г. в 11:46, Tom Lane <tgl@sss.pgh.pa.us>:

Show quoted text

Alex Bolenok <quassnoi@gmail.com> writes:

I get why it's not working (because the statement is not allowed to see

the

tuples with its own cmin), but I was wondering if it was worth it at

least

to spell it out explicitly in the documentation.

What's not explicit about this?

The sub-statements in WITH are executed concurrently with each other
and with the main query. Therefore, when using data-modifying
statements in WITH, the order in which the specified updates actually
happen is unpredictable. All the statements are executed with the same
snapshot (see Chapter 13), so they cannot “see” one another's effects
on the target tables.

regards, tom lane

#5Adrian Klaver
adrian.klaver@aklaver.com
In reply to: Alex Bolenok (#4)
Re: Joins of data-modifying CTE with the target table

On 4/19/23 09:09, Alex Bolenok wrote:

To reiterate, I (I personally) get why it's not working, but the
documentation is something that is intended for people who don't.

Technically, anyone can deduce it from reading the documentation
thoroughly enough, but do we need the documentation to be as terse
as possible?

Not so much terse as possible as not as verbose as possible. The
difference being that documenting every possible usage would turn the
page into an eye glazing experience. At that point folks would really
just move on. The documentation expectations(as I see it) are that: 1)
People can work from basic principles to specific usage. 2) They also
actually test/verify what they run.

What you are after is more on the lines of a separate enhanced tutorial.
That has been discussed on this list and elsewhere. So far it has not
progressed beyond discussion.

To answer your question, by "explicitly" I mean mentioning this very
pattern, where you join the returned id with the target table. People
often try to use this pattern for queries like "add an item to the order
in a CTE, select the order total in the main query", and often don't
notice that the order total doesn't include the new item until it hits
production.

--
Adrian Klaver
adrian.klaver@aklaver.com

#6Tom Lane
tgl@sss.pgh.pa.us
In reply to: Alex Bolenok (#4)
Re: Joins of data-modifying CTE with the target table

Alex Bolenok <quassnoi@gmail.com> writes:

To answer your question, by "explicitly" I mean mentioning this very
pattern, where you join the returned id with the target table.

I do not see why this existing example isn't sufficient:

... An example of this is that in

WITH t AS (
UPDATE products SET price = price * 1.05
RETURNING *
)
SELECT * FROM products;

the outer SELECT would return the original prices before the action of
the UPDATE, ...

Yeah, it lacks a JOIN and it's an UPDATE not an INSERT. But we can't
provide an example to precisely match every possible mistake. As
Adrian said, brevity in the docs does have value.

regards, tom lane