First-draft release notes for next week's releases

Started by Tom Laneabout 12 years ago37 messageshackers

tgl@sss.pgh.pa.us

about 12 years ago

First-draft release notes are committed, and should be visible at
http://www.postgresql.org/docs/devel/static/release-9-3-4.html
once guaibasaurus does its next buildfarm run a few minutes from
now. Any suggestions?

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Josh Berkus

josh@agliodbs.com

about 12 years ago

In reply to: Tom Lane (#1)

Re: First-draft release notes for next week's releases

On 03/15/2014 01:02 PM, Tom Lane wrote:

First-draft release notes are committed, and should be visible at
http://www.postgresql.org/docs/devel/static/release-9-3-4.html
once guaibasaurus does its next buildfarm run a few minutes from
now. Any suggestions?

Hmmm, not sure I like this. It's complicated without being complete,
and supplies just enough information to get someone into trouble:

Also, the error fixed in the second changelog entry below could have
caused some bloat in statistics data. Users who have done many DROP
DATABASE commands since upgrading to 9.3 may wish to manually remove
files in $PGDATA/pg_stat_tmp (or $PGDATA/pg_stat if the server is not
running) that have old modification times and do not correspond to any
database OID present in $PGDATA/base. If you do this, note that the file
db_0.stat is a valid file even though it does not correspond to any
$PGDATA/base subdirectory.

I kind of think that either we should provide complete instructions
(which would be about 3/4 of a page), or provide limited instructions
and assume the only users who will do this are ones who already
understand pg_stat (a reasonable assumption in my opinion), so my
suggestion is move the advice paragraph from E 1.1 to the individual fix
entry in E.1.2, and change it to this:

* Remove the correct per-database statistics file during DROP DATABASE
(Tomas Vondra)

This fix prevents a permanent leak of statistics file space.

Users who have done many DROP DATABASE commands in PostgreSQL 9.3 may
wish to examine their statistics directory for statistics files which do
not correspond to any existing database and delete them. Please note
that db_0.stat is a needed statistics file.

--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Import Notes

Reply to msg id not found: WM7f751aa605f11dead874716540a7b6efa31be8ded0e22d5fe281c4fdfed7bbd674749dbae4f7a600eb4c6ce04d9fb0a9@asav-2.01.com

Bruce Momjian

bruce@momjian.us

about 12 years ago

In reply to: Josh Berkus (#2)

Re: First-draft release notes for next week's releases

This is not really accurate:

"This error allowed multiple versions of the same row to become
visible to queries, resulting in apparent duplicates. Since the error
is in WAL replay, it would only manifest during crash recovery or on
standby servers."

I think the idea is coming from what the second sentence below is
getting at but it may be too complex to explain in a release note:

The error causes some rows to disappear from indexes resulting in
inconsistent query results on a hot standby depending on whether
indexes are used. If the standby is subsequently activated or if it
occurs during recovery after a crash or backup restore it could result
in unique constraint violations as well.

I would consider adding something like "For the problem to occur a
foreign key from another table must exist and a new row must be added
to that other table around the same time (possibly in the same
transaction) as an update to the referenced row" That would help
people judge whether their databases are vulnerable. If they don't
have foreign keys or if they have a coding pattern that causes this to
happen regularly then they should be able to figure that out and
possibly disable them if they can't update promptly.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Josh Berkus

josh@agliodbs.com

about 12 years ago

In reply to: Tom Lane (#1)

Re: First-draft release notes for next week's releases

On 03/16/2014 12:32 PM, Greg Stark wrote:

I would consider adding something like "For the problem to occur a
foreign key from another table must exist and a new row must be added
to that other table around the same time (possibly in the same
transaction) as an update to the referenced row" That would help
people judge whether their databases are vulnerable. If they don't
have foreign keys or if they have a coding pattern that causes this to
happen regularly then they should be able to figure that out and
possibly disable them if they can't update promptly.

I don't think that will actually help people know whether they're
vulnerable without a longer explanation.

It's starting to sound like we need a wiki page for this release?

--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Import Notes

Reply to msg id not found: WMfdd26758bbf5ef5e15744a5d99a71c9d948b8be29d602469e84407dd8ce4444d5c8843266721fbb34f46a9da144c7f72@asav-3.01.com

Tom Lane

tgl@sss.pgh.pa.us

about 12 years ago

In reply to: Bruce Momjian (#3)

Re: First-draft release notes for next week's releases

Greg Stark <stark@mit.edu> writes:

This is not really accurate:
"This error allowed multiple versions of the same row to become
visible to queries, resulting in apparent duplicates. Since the error
is in WAL replay, it would only manifest during crash recovery or on
standby servers."

I think the idea is coming from what the second sentence below is
getting at but it may be too complex to explain in a release note:

The error causes some rows to disappear from indexes resulting in
inconsistent query results on a hot standby depending on whether
indexes are used. If the standby is subsequently activated or if it
occurs during recovery after a crash or backup restore it could result
in unique constraint violations as well.

Hm ... "rows disappearing from indexes" might make people think that
they could fix or mitigate the damage via REINDEX. That's not really
true though is it? It looks to me like IndexBuildHeapScan will suffer
an Assert failure in an assert-enabled build, or build a bogus index
if not assert-enabled, when it comes across a "heap-only" tuple that
has no parent.

I'm thinking we'd better promote that Assert to a normal runtime elog.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Josh Berkus

josh@agliodbs.com

about 12 years ago

In reply to: Tom Lane (#1)

Re: First-draft release notes for next week's releases

On 03/17/2014 08:28 AM, Tom Lane wrote:

Greg Stark <stark@mit.edu> writes:

The error causes some rows to disappear from indexes resulting in
inconsistent query results on a hot standby depending on whether
indexes are used. If the standby is subsequently activated or if it
occurs during recovery after a crash or backup restore it could result
in unique constraint violations as well.

Hm ... "rows disappearing from indexes" might make people think that
they could fix or mitigate the damage via REINDEX. That's not really
true though is it? It looks to me like IndexBuildHeapScan will suffer
an Assert failure in an assert-enabled build, or build a bogus index
if not assert-enabled, when it comes across a "heap-only" tuple that
has no parent.

First, see suggested text in my first-draft release announcement.

Second, if a user has encountered this kind of data corruption on their
master (due to crash recovery), how exactly *do* they fix it?

--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Import Notes

Reply to msg id not found: WM835a0b89d4618b8e377f4c7d28f214dc22c5b997fc5c8640a3f21f8614427f56ba658fc6e8b39c4d342861c034a8ca3e@asav-3.01.com

Andres Freund

andres@anarazel.de

about 12 years ago

In reply to: Josh Berkus (#6)

Re: First-draft release notes for next week's releases

On 2014-03-17 10:03:52 -0700, Josh Berkus wrote:

On 03/17/2014 08:28 AM, Tom Lane wrote:

Greg Stark <stark@mit.edu> writes:

The error causes some rows to disappear from indexes resulting in
inconsistent query results on a hot standby depending on whether
indexes are used. If the standby is subsequently activated or if it
occurs during recovery after a crash or backup restore it could result
in unique constraint violations as well.

Hm ... "rows disappearing from indexes" might make people think that
they could fix or mitigate the damage via REINDEX. That's not really
true though is it? It looks to me like IndexBuildHeapScan will suffer
an Assert failure in an assert-enabled build, or build a bogus index
if not assert-enabled, when it comes across a "heap-only" tuple that
has no parent.

First, see suggested text in my first-draft release announcement.

I don't think that text is any better, it's imo even wrong:
"The bug causes rows to vanish from indexes during recovery due to
simultaneous updates of rows on both sides of a foreign key."

Neither is a foreign key, nor simultaneous updates, nor both sides a
prerequisite.

Second, if a user has encountered this kind of data corruption on their
master (due to crash recovery), how exactly *do* they fix it?

Dump/restore is the most obvious candidate. The next best thing I can
think of is a noop rewriting ALTER TABLE, that doesn't deal with ctid
chains IIRC, in contrast to CLUSTER/VACUUM FULL.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Andres Freund

andres@anarazel.de

about 12 years ago

In reply to: Tom Lane (#1)

Re: First-draft release notes for next week's releases

On 2014-03-15 16:02:19 -0400, Tom Lane wrote:

First-draft release notes are committed, and should be visible at
http://www.postgresql.org/docs/devel/static/release-9-3-4.html
once guaibasaurus does its next buildfarm run a few minutes from
now. Any suggestions?

So, the current text is:
"This error allowed multiple versions of the same row to become visible
to queries, resulting in apparent duplicates. Since the error is in WAL
replay, it would only manifest during crash recovery or on standby
servers."

what about:

The most prominent consequence of this bug is that rows can appear to
not exist when accessed via an index, while still being visible in
sequential scans. This in turn can lead to constraints, including unique
and foreign key ones, to be violated lateron.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Tom Lane

tgl@sss.pgh.pa.us

about 12 years ago

In reply to: Andres Freund (#7)

Re: First-draft release notes for next week's releases

Andres Freund <andres@2ndquadrant.com> writes:

On 2014-03-17 10:03:52 -0700, Josh Berkus wrote:

First, see suggested text in my first-draft release announcement.

I don't think that text is any better, it's imo even wrong:
"The bug causes rows to vanish from indexes during recovery due to
simultaneous updates of rows on both sides of a foreign key."

Neither is a foreign key, nor simultaneous updates, nor both sides a
prerequisite.

What I've got at the moment is

This error caused updated rows to disappear from indexes, resulting
in inconsistent query results depending on whether an index scan was
used. Subsequent processing could result in unique-key violations,
since the previously updated row would not be found by later index
searches. Since this error is in WAL replay, it would only manifest
during crash recovery or on standby servers. The improperly-replayed
case can arise when a table row that is referenced by a foreign-key
constraint is updated concurrently with creation of a
referencing row.

OK, or not? The time window for bikeshedding this is dwindling rapidly.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#10

Andres Freund

andres@anarazel.de

about 12 years ago

In reply to: Tom Lane (#9)

Re: First-draft release notes for next week's releases

On 2014-03-17 13:42:59 -0400, Tom Lane wrote:

Andres Freund <andres@2ndquadrant.com> writes:

On 2014-03-17 10:03:52 -0700, Josh Berkus wrote:

First, see suggested text in my first-draft release announcement.

I don't think that text is any better, it's imo even wrong:
"The bug causes rows to vanish from indexes during recovery due to
simultaneous updates of rows on both sides of a foreign key."

Neither is a foreign key, nor simultaneous updates, nor both sides a
prerequisite.

What I've got at the moment is

This error caused updated rows to disappear from indexes, resulting
in inconsistent query results depending on whether an index scan was
used. Subsequent processing could result in unique-key violations,
since the previously updated row would not be found by later index
searches. Since this error is in WAL replay, it would only manifest
during crash recovery or on standby servers. The improperly-replayed
case can arise when a table row that is referenced by a foreign-key
constraint is updated concurrently with creation of a
referencing row.

OK, or not? The time window for bikeshedding this is dwindling rapidly.

That's much better, yes. Two things:

* I'd change the warning about unique key violations into a more general
one about constraints. Foreign key and exclusion constraint are also
affected...
* I wonder if we should make the possible origins a bit more
general as it's perfectly possible to trigger the problem without
foreign keys. Maybe: "can arise when a table row that has been updated
is row locked; that can e.g. happen when foreign keys are used."

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#11

Andres Freund

andres@anarazel.de

about 12 years ago

In reply to: Tom Lane (#5)

Re: First-draft release notes for next week's releases

On 2014-03-17 11:28:45 -0400, Tom Lane wrote:

Hm ... "rows disappearing from indexes" might make people think that
they could fix or mitigate the damage via REINDEX.

Good point. I guess in some cases it will end up working because
VACUUM/hot pruning have cleaned up the mess, but that's certainly not
something I'd want to rely upon. They very well could have messed up
things when presented with bogus input data.

That's not really
true though is it? It looks to me like IndexBuildHeapScan will suffer
an Assert failure in an assert-enabled build, or build a bogus index
if not assert-enabled, when it comes across a "heap-only" tuple that
has no parent.

I'm thinking we'd better promote that Assert to a normal runtime elog.

I wonder if we also should make rewriteheap.c warn about such
things. Although I don't immediately see a trivial way to do so.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#12

Tom Lane

tgl@sss.pgh.pa.us

about 12 years ago

In reply to: Andres Freund (#10)

Re: First-draft release notes for next week's releases

Andres Freund <andres@2ndquadrant.com> writes:

That's much better, yes. Two things:

* I'd change the warning about unique key violations into a more general
one about constraints. Foreign key and exclusion constraint are also
affected...

I'll see what I can do.

* I wonder if we should make the possible origins a bit more
general as it's perfectly possible to trigger the problem without
foreign keys. Maybe: "can arise when a table row that has been updated
is row locked; that can e.g. happen when foreign keys are used."

IIUC, this case only occurs when using the new-in-9.3 types of
nonexclusive row locks. I'm willing to bet that the number of
applications using those is negligible; so I think it's all right to not
mention that case explicitly, as long as the wording doesn't say that
foreign keys are the *only* cause (which I didn't).

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#13

Andres Freund

andres@anarazel.de

about 12 years ago

In reply to: Tom Lane (#12)

Re: First-draft release notes for next week's releases

On 2014-03-17 14:01:03 -0400, Tom Lane wrote:

Andres Freund <andres@2ndquadrant.com> writes:

* I wonder if we should make the possible origins a bit more
general as it's perfectly possible to trigger the problem without
foreign keys. Maybe: "can arise when a table row that has been updated
is row locked; that can e.g. happen when foreign keys are used."

IIUC, this case only occurs when using the new-in-9.3 types of
nonexclusive row locks. I'm willing to bet that the number of
applications using those is negligible; so I think it's all right to not
mention that case explicitly, as long as the wording doesn't say that
foreign keys are the *only* cause (which I didn't).

I actually think the issue could also occur with row locks of other
severities (is that the correct term?). Alvaro probably knows better,
but if I see correctly it's also triggerable if a backend waits for an
updating transaction to finish and follow_updates = true is passed to
heap_lock_tuple(). Which e.g. nodeLockRows.c does...

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#14

Tom Lane

tgl@sss.pgh.pa.us

about 12 years ago

In reply to: Andres Freund (#13)

Re: First-draft release notes for next week's releases

Andres Freund <andres@2ndquadrant.com> writes:

On 2014-03-17 14:01:03 -0400, Tom Lane wrote:

IIUC, this case only occurs when using the new-in-9.3 types of
nonexclusive row locks. I'm willing to bet that the number of
applications using those is negligible; so I think it's all right to not
mention that case explicitly, as long as the wording doesn't say that
foreign keys are the *only* cause (which I didn't).

I actually think the issue could also occur with row locks of other
severities (is that the correct term?).

The commit log entry says

We were resetting the tuple's HEAP_HOT_UPDATED flag as well as t_ctid on
WAL replay of a tuple-lock operation, which is incorrect when the tuple
is already updated.

Back-patch to 9.3. The clearing of both header elements was there
previously, but since no update could be present on a tuple that was
being locked, it was harmless.

which I read to mean that the case can't occur with the types of row
locks that were allowed pre-9.3.

but if I see correctly it's also triggerable if a backend waits for an
updating transaction to finish and follow_updates = true is passed to
heap_lock_tuple(). Which e.g. nodeLockRows.c does...

That sounds backwards. nodeLockRows locks the latest tuple in the chain,
so it can't be subject to this.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#15

Andres Freund

andres@anarazel.de

about 12 years ago

In reply to: Tom Lane (#14)

Re: First-draft release notes for next week's releases

On 2014-03-17 14:16:41 -0400, Tom Lane wrote:

Andres Freund <andres@2ndquadrant.com> writes:

On 2014-03-17 14:01:03 -0400, Tom Lane wrote:

IIUC, this case only occurs when using the new-in-9.3 types of
nonexclusive row locks. I'm willing to bet that the number of
applications using those is negligible; so I think it's all right to not
mention that case explicitly, as long as the wording doesn't say that
foreign keys are the *only* cause (which I didn't).

I actually think the issue could also occur with row locks of other
severities (is that the correct term?).

The commit log entry says

We were resetting the tuple's HEAP_HOT_UPDATED flag as well as t_ctid on
WAL replay of a tuple-lock operation, which is incorrect when the tuple
is already updated.

Back-patch to 9.3. The clearing of both header elements was there
previously, but since no update could be present on a tuple that was
being locked, it was harmless.

which I read to mean that the case can't occur with the types of row
locks that were allowed pre-9.3.

That's not an unreasonable interpretation of the commit message, but I
don't think it's correct with respect to the code :(

but if I see correctly it's also triggerable if a backend waits for an
updating transaction to finish and follow_updates = true is passed to
heap_lock_tuple(). Which e.g. nodeLockRows.c does...

That sounds backwards. nodeLockRows locks the latest tuple in the chain,
so it can't be subject to this.

Hm, I don't see anything in the code preventing it, that's the
lock_tuple() before the EPQ stuff... in ExecLockRows():
foreach(lc, node->lr_arowMarks)
{
test = heap_lock_tuple(erm->relation, &tuple,
estate->es_output_cid,
lockmode, erm->noWait, true,
&buffer, &hufd);
ReleaseBuffer(buffer);
switch (test)
{
case HeapTupleSelfUpdated:
...

the true passed to heap_lock_tuple() is the follow_updates
parameter. And then in heap_lock_tuple():
if (require_sleep)
{
if (infomask & HEAP_XMAX_IS_MULTI)
{
...
/* if there are updates, follow the update chain */
if (follow_updates &&
!HEAP_XMAX_IS_LOCKED_ONLY(infomask))
{
HTSU_Result res;

res = heap_lock_updated_tuple(relation, tuple, &t_ctid,
GetCurrentTransactionId(),
...
else
{
/* wait for regular transaction to end */
if (nowait)
{
if (!ConditionalXactLockTableWait(xwait))
ereport(ERROR,
(errcode(ERRCODE_LOCK_NOT_AVAILABLE),
errmsg("could not obtain lock on row in relation \"%s\"",
RelationGetRelationName(relation))));
}
else
XactLockTableWait(xwait);

/* if there are updates, follow the update chain */
if (follow_updates &&
!HEAP_XMAX_IS_LOCKED_ONLY(infomask))
...
if (RelationNeedsWAL(relation))
{
xl_heap_lock xlrec;
XLogRecPtr recptr;
XLogRecData rdata[2];

xlrec.target.node = relation->rd_node;
xlrec.target.tid = tuple->t_self;
...

To me that looks sufficient to trigger the bug, because we're issuing a
wal record about the row that was passed to heap_lock_update(), not the
latest one in the ctid chain. When replaying that record, it will reset
the t_ctid field, thus breaking the chain.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#16

Tom Lane

tgl@sss.pgh.pa.us

about 12 years ago

In reply to: Andres Freund (#15)

Re: First-draft release notes for next week's releases

Andres Freund <andres@2ndquadrant.com> writes:

To me that looks sufficient to trigger the bug, because we're issuing a
wal record about the row that was passed to heap_lock_update(), not the
latest one in the ctid chain. When replaying that record, it will reset
the t_ctid field, thus breaking the chain.

[ scratches head ... ] If that's what's happening, isn't it a bug in
itself? Surely the WAL record ought to point at the tuple that was
locked.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#17

Andres Freund

andres@anarazel.de

about 12 years ago

In reply to: Tom Lane (#16)

Re: First-draft release notes for next week's releases

On 2014-03-17 14:29:56 -0400, Tom Lane wrote:

Andres Freund <andres@2ndquadrant.com> writes:

To me that looks sufficient to trigger the bug, because we're issuing a
wal record about the row that was passed to heap_lock_update(), not the
latest one in the ctid chain. When replaying that record, it will reset
the t_ctid field, thus breaking the chain.

[ scratches head ... ] If that's what's happening, isn't it a bug in
itself? Surely the WAL record ought to point at the tuple that was
locked.

There's a separate XLOG_HEAP2_LOCK_UPDATED record, for every later tuple
version, emitted by heap_lock_updated_tuple_rec(). This really is mind
bendingly complex :(.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#18

Tom Lane

tgl@sss.pgh.pa.us

about 12 years ago

In reply to: Andres Freund (#17)

Re: First-draft release notes for next week's releases

Andres Freund <andres@2ndquadrant.com> writes:

On 2014-03-17 14:29:56 -0400, Tom Lane wrote:

[ scratches head ... ] If that's what's happening, isn't it a bug in
itself? Surely the WAL record ought to point at the tuple that was
locked.

There's a separate XLOG_HEAP2_LOCK_UPDATED record, for every later tuple
version, emitted by heap_lock_updated_tuple_rec(). This really is mind
bendingly complex :(.

Ah, I see; so only the original tuple in the chain is at risk?

How about this:

This error caused updated rows to not be found by index scans, resulting
in inconsistent query results depending on whether an index scan was
used. Subsequent processing could result in constraint violations,
since the previously updated row would not be found by later index
searches, thus possibly allowing conflicting rows to be inserted.
Since this error is in WAL replay, it would only manifest during crash
recovery or on standby servers. The improperly-replayed case most
commonly arises when a table row that is referenced by a foreign-key
constraint is updated concurrently with creation of a referencing row;
but it can also occur when any variant of <command>SELECT FOR UPDATE</>
is applied to a row that is being concurrently updated.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#19

Andres Freund

andres@anarazel.de

about 12 years ago

In reply to: Tom Lane (#18)

Re: First-draft release notes for next week's releases

On 2014-03-17 14:52:25 -0400, Tom Lane wrote:

Andres Freund <andres@2ndquadrant.com> writes:

On 2014-03-17 14:29:56 -0400, Tom Lane wrote:

[ scratches head ... ] If that's what's happening, isn't it a bug in
itself? Surely the WAL record ought to point at the tuple that was
locked.

There's a separate XLOG_HEAP2_LOCK_UPDATED record, for every later tuple
version, emitted by heap_lock_updated_tuple_rec(). This really is mind
bendingly complex :(.

Ah, I see; so only the original tuple in the chain is at risk?

Depending on what you define the "original tuple in the chain" to
be. No, if you happen to mean the root tuple of a ctid chain or similar;
which I guess you didn't. Yes, if you mean the tuplepassed to
heap_lock_tuple(). heap_xlog_lock_updated() looks (and has looked)
correct.

How about this:

Sounds good to me.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#20

Alvaro Herrera

alvherre@2ndquadrant.com

about 12 years ago

In reply to: Andres Freund (#13)

Re: First-draft release notes for next week's releases

Andres Freund wrote:

On 2014-03-17 14:01:03 -0400, Tom Lane wrote:

Andres Freund <andres@2ndquadrant.com> writes:

* I wonder if we should make the possible origins a bit more
general as it's perfectly possible to trigger the problem without
foreign keys. Maybe: "can arise when a table row that has been updated
is row locked; that can e.g. happen when foreign keys are used."

IIUC, this case only occurs when using the new-in-9.3 types of
nonexclusive row locks. I'm willing to bet that the number of
applications using those is negligible; so I think it's all right to not
mention that case explicitly, as long as the wording doesn't say that
foreign keys are the *only* cause (which I didn't).

I actually think the issue could also occur with row locks of other
severities (is that the correct term?). Alvaro probably knows better,
but if I see correctly it's also triggerable if a backend waits for an
updating transaction to finish and follow_updates = true is passed to
heap_lock_tuple(). Which e.g. nodeLockRows.c does...

Uhm. But at the bottom of that block, right above the "failed:" label
(heapam.c line 4527 in current master), we recheck the tuple for
"locked-only-ness"; and fail the whole operation by returning
HeapTupleUpdated, if it's not locked-only, no? Which would cause
ExecLockRows to grab the next version via EvalPlanQualFetch.
Essentially that check is a lock-conflict test, and the only thing that
does not conflict with an update is a FOR KEY SHARE lock.

Note the only way to pass that test is that either the tuple is
locked-only (spelled in three different ways), or "!require_sleep".

Am I completely misunderstanding what's being said here?

--
ï¿½lvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#21

Tom Lane

tgl@sss.pgh.pa.us

about 12 years ago

In reply to: Alvaro Herrera (#20)

#22

Andres Freund

andres@anarazel.de

about 12 years ago

In reply to: Alvaro Herrera (#20)

#23

Bruce Momjian