data modifying WITH seems to drop rows in cascading updates -- bug?

Started by Merlin Moncureover 6 years ago5 messagesbugs
Jump to latest
#1Merlin Moncure
mmoncure@gmail.com

Hackers,

Trying to figure out if this is undefined behavior of a bug. It's
confusing, and I'm aware of certain oddities in the fringes of the
data modifying with queries where the query dependencies are not
really clear. Why does the query only return one row?

postgres=# create table foo(id int);
CREATE TABLE
postgres=# insert into foo values(1);
INSERT 0 1
postgres=# with a as (update foo set id = id + 1 returning *), b
as(update foo set id = id + 1 returning * ) select * from a union all
select id from b;
id
────
2
(1 row)

postgres=# select version();
version
─────────────────────────────────────────────────────────────────────────────────────────────────────────
PostgreSQL 11.1 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.5
20150623 (Red Hat 4.8.5-28), 64-bit
(1 row)

merlin

#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: Merlin Moncure (#1)
Re: data modifying WITH seems to drop rows in cascading updates -- bug?

Merlin Moncure <mmoncure@gmail.com> writes:

Trying to figure out if this is undefined behavior of a bug. It's
confusing, and I'm aware of certain oddities in the fringes of the
data modifying with queries where the query dependencies are not
really clear. Why does the query only return one row?

postgres=# create table foo(id int);
CREATE TABLE
postgres=# insert into foo values(1);
INSERT 0 1
postgres=# with a as (update foo set id = id + 1 returning *), b
as(update foo set id = id + 1 returning * ) select * from a union all
select id from b;
id
────
2
(1 row)

FWIW, I think it's intentional. The two UPDATEs execute against the
same snapshot, so only one of them can update the row --- the other
one is going to see it as already-updated-by-self. It's undefined
only to the extent that it's not completely clear which one gets
there first. In this formulation of the outer query, I think it's
pretty safe to assume that "a" will get there first, but if you'd
joined "a" and "b" in some other fashion, conceivably "b" would.

Note that the fine manual (sec. 7.8.2) says

Trying to update the same row twice in a single statement is not
supported. Only one of the modifications takes place, but it is not
easy (and sometimes not possible) to reliably predict which one. This
also applies to deleting a row that was already updated in the same
statement: only the update is performed. Therefore you should
generally avoid trying to modify a single row twice in a single
statement. In particular avoid writing WITH sub-statements that could
affect the same rows changed by the main statement or a sibling
sub-statement. The effects of such a statement will not be
predictable.

regards, tom lane

#3Merlin Moncure
mmoncure@gmail.com
In reply to: Tom Lane (#2)
Re: data modifying WITH seems to drop rows in cascading updates -- bug?

On Friday, August 23, 2019, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Merlin Moncure <mmoncure@gmail.com> writes:

Trying to figure out if this is undefined behavior of a bug. It's
confusing, and I'm aware of certain oddities in the fringes of the
data modifying with queries where the query dependencies are not
really clear. Why does the query only return one row?

postgres=# create table foo(id int);
CREATE TABLE
postgres=# insert into foo values(1);
INSERT 0 1
postgres=# with a as (update foo set id = id + 1 returning *), b
as(update foo set id = id + 1 returning * ) select * from a union all
select id from b;
id
────
2
(1 row)

FWIW, I think it's intentional. The two UPDATEs execute against the
same snapshot, so only one of them can update the row --- the other
one is going to see it as already-updated-by-self. It's undefined
only to the extent that it's not completely clear which one gets
there first. In this formulation of the outer query, I think it's
pretty safe to assume that "a" will get there first, but if you'd
joined "a" and "b" in some other fashion, conceivably "b" would.

Note that the fine manual (sec. 7.8.2) says

Trying to update the same row twice in a single statement is not
supported. Only one of the modifications takes place, but it is not
easy (and sometimes not possible) to reliably predict which one. This
also applies to deleting a row that was already updated in the same
statement: only the update is performed. Therefore you should
generally avoid trying to modify a single row twice in a single
statement. In particular avoid writing WITH sub-statements that could
affect the same rows changed by the main statement or a sibling
sub-statement. The effects of such a statement will not be
predictable.

Right. Shame on me for not checking the docs before posting. Simply
stated, this is undefined behavior.

merlin

#4Marko Tiikkaja
marko@joh.to
In reply to: Merlin Moncure (#3)
Re: data modifying WITH seems to drop rows in cascading updates -- bug?

On Sat, Aug 24, 2019 at 6:33 AM Merlin Moncure <mmoncure@gmail.com> wrote:

Right. Shame on me for not checking the docs before posting. Simply
stated, this is undefined behavior.

Nah; it's defined, yet unpredictable behavior. Still worth half the
points, I guess.

-m

#5Merlin Moncure
mmoncure@gmail.com
In reply to: Marko Tiikkaja (#4)

On Saturday, August 24, 2019, Marko Tiikkaja <marko@joh.to> wrote:

On Sat, Aug 24, 2019 at 6:33 AM Merlin Moncure <mmoncure@gmail.com> wrote:

Right. Shame on me for not checking the docs before posting. Simply
stated, this is undefined behavior.

Nah; it's defined, yet unpredictable behavior. Still worth half the
points, I guess.

Lack of prediction on outputs is a class of undefined behavior. If a
program is left in unknown state (including results returned to caller)
this meets the classic definition of undefined. I realize this is purely
semantic pedantry but in language design these distinctions are important
and fun to discuss.

Undefined behavior is good. It allows for future refinements of the model
and frees the caller from expectations against the result; if some fast,
clever way of managing these cases is somehow innovated the behavior can be
defined later without breaking compatibility.

FYI: this thread exists as it was turned up during the process of teaching
yet another convert the cooler stuff with postgres and he's now getting
into wCTE. I visibly saw the light bulb go off...he's ours now. I'm a fan
of your work

merlin