data modifying WITH seems to drop rows in cascading updates -- bug?
Hackers,
Trying to figure out if this is undefined behavior of a bug. It's
confusing, and I'm aware of certain oddities in the fringes of the
data modifying with queries where the query dependencies are not
really clear. Why does the query only return one row?
postgres=# create table foo(id int);
CREATE TABLE
postgres=# insert into foo values(1);
INSERT 0 1
postgres=# with a as (update foo set id = id + 1 returning *), b
as(update foo set id = id + 1 returning * ) select * from a union all
select id from b;
id
────
2
(1 row)
postgres=# select version();
version
─────────────────────────────────────────────────────────────────────────────────────────────────────────
PostgreSQL 11.1 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.5
20150623 (Red Hat 4.8.5-28), 64-bit
(1 row)
merlin
Merlin Moncure <mmoncure@gmail.com> writes:
Trying to figure out if this is undefined behavior of a bug. It's
confusing, and I'm aware of certain oddities in the fringes of the
data modifying with queries where the query dependencies are not
really clear. Why does the query only return one row?
postgres=# create table foo(id int);
CREATE TABLE
postgres=# insert into foo values(1);
INSERT 0 1
postgres=# with a as (update foo set id = id + 1 returning *), b
as(update foo set id = id + 1 returning * ) select * from a union all
select id from b;
id
────
2
(1 row)
FWIW, I think it's intentional. The two UPDATEs execute against the
same snapshot, so only one of them can update the row --- the other
one is going to see it as already-updated-by-self. It's undefined
only to the extent that it's not completely clear which one gets
there first. In this formulation of the outer query, I think it's
pretty safe to assume that "a" will get there first, but if you'd
joined "a" and "b" in some other fashion, conceivably "b" would.
Note that the fine manual (sec. 7.8.2) says
Trying to update the same row twice in a single statement is not
supported. Only one of the modifications takes place, but it is not
easy (and sometimes not possible) to reliably predict which one. This
also applies to deleting a row that was already updated in the same
statement: only the update is performed. Therefore you should
generally avoid trying to modify a single row twice in a single
statement. In particular avoid writing WITH sub-statements that could
affect the same rows changed by the main statement or a sibling
sub-statement. The effects of such a statement will not be
predictable.
regards, tom lane
On Friday, August 23, 2019, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Merlin Moncure <mmoncure@gmail.com> writes:
Trying to figure out if this is undefined behavior of a bug. It's
confusing, and I'm aware of certain oddities in the fringes of the
data modifying with queries where the query dependencies are not
really clear. Why does the query only return one row?postgres=# create table foo(id int);
CREATE TABLE
postgres=# insert into foo values(1);
INSERT 0 1
postgres=# with a as (update foo set id = id + 1 returning *), b
as(update foo set id = id + 1 returning * ) select * from a union all
select id from b;
id
────
2
(1 row)FWIW, I think it's intentional. The two UPDATEs execute against the
same snapshot, so only one of them can update the row --- the other
one is going to see it as already-updated-by-self. It's undefined
only to the extent that it's not completely clear which one gets
there first. In this formulation of the outer query, I think it's
pretty safe to assume that "a" will get there first, but if you'd
joined "a" and "b" in some other fashion, conceivably "b" would.Note that the fine manual (sec. 7.8.2) says
Trying to update the same row twice in a single statement is not
supported. Only one of the modifications takes place, but it is not
easy (and sometimes not possible) to reliably predict which one. This
also applies to deleting a row that was already updated in the same
statement: only the update is performed. Therefore you should
generally avoid trying to modify a single row twice in a single
statement. In particular avoid writing WITH sub-statements that could
affect the same rows changed by the main statement or a sibling
sub-statement. The effects of such a statement will not be
predictable.
Right. Shame on me for not checking the docs before posting. Simply
stated, this is undefined behavior.
merlin
On Sat, Aug 24, 2019 at 6:33 AM Merlin Moncure <mmoncure@gmail.com> wrote:
Right. Shame on me for not checking the docs before posting. Simply
stated, this is undefined behavior.
Nah; it's defined, yet unpredictable behavior. Still worth half the
points, I guess.
-m
On Saturday, August 24, 2019, Marko Tiikkaja <marko@joh.to> wrote:
On Sat, Aug 24, 2019 at 6:33 AM Merlin Moncure <mmoncure@gmail.com> wrote:
Right. Shame on me for not checking the docs before posting. Simply
stated, this is undefined behavior.Nah; it's defined, yet unpredictable behavior. Still worth half the
points, I guess.
Lack of prediction on outputs is a class of undefined behavior. If a
program is left in unknown state (including results returned to caller)
this meets the classic definition of undefined. I realize this is purely
semantic pedantry but in language design these distinctions are important
and fun to discuss.
Undefined behavior is good. It allows for future refinements of the model
and frees the caller from expectations against the result; if some fast,
clever way of managing these cases is somehow innovated the behavior can be
defined later without breaking compatibility.
FYI: this thread exists as it was turned up during the process of teaching
yet another convert the cooler stuff with postgres and he's now getting
into wCTE. I visibly saw the light bulb go off...he's ours now. I'm a fan
of your work
merlin