pgsql: Fix a couple of bugs in MultiXactId freezing
Fix a couple of bugs in MultiXactId freezing
Both heap_freeze_tuple() and heap_tuple_needs_freeze() neglected to look
into a multixact to check the members against cutoff_xid. This means
that a very old Xid could survive hidden within a multi, possibly
outliving its CLOG storage. In the distant future, this would cause
clog lookup failures:
ERROR: could not access status of transaction 3883960912
DETAIL: Could not open file "pg_clog/0E78": No such file or directory.
This mostly was problematic when the updating transaction aborted, since
in that case the row wouldn't get pruned away earlier in vacuum and the
multixact could possibly survive for a long time. In many cases, data
that is inaccessible for this reason way can be brought back
heuristically.
As a second bug, heap_freeze_tuple() didn't properly handle multixacts
that need to be frozen according to cutoff_multi, but whose updater xid
is still alive. Instead of preserving the update Xid, it just set Xmax
invalid, which leads to both old and new tuple versions becoming
visible. This is pretty rare in practice, but a real threat
nonetheless. Existing corrupted rows, unfortunately, cannot be repaired
in an automated fashion.
Existing physical replicas might have already incorrectly frozen tuples
because of different behavior than in master, which might only become
apparent in the future once pg_multixact/ is truncated; it is
recommended that all clones be rebuilt after upgrading.
Following code analysis caused by bug report by J Smith in message
CADFUPgc5bmtv-yg9znxV-vcfkb+JPRqs7m2OesQXaM_4Z1JpdQ@mail.gmail.com
and privately by F-Secure.
Backpatch to 9.3, where freezing of MultiXactIds was introduced.
Analysis and patch by Andres Freund, with some tweaks by Álvaro.
Branch
------
REL9_3_STABLE
Details
-------
http://git.postgresql.org/pg/commitdiff/8e53ae025de90b8f7d935ce0eb4d0551178a4caf
Modified Files
--------------
src/backend/access/heap/heapam.c | 160 ++++++++++++++++++++++++++++----
src/backend/access/transam/multixact.c | 14 ++-
2 files changed, 151 insertions(+), 23 deletions(-)
--
Sent via pgsql-committers mailing list (pgsql-committers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-committers
On Sat, Nov 30, 2013 at 01:06:09AM +0000, Alvaro Herrera wrote:
Fix a couple of bugs in MultiXactId freezing
Both heap_freeze_tuple() and heap_tuple_needs_freeze() neglected to look
into a multixact to check the members against cutoff_xid.
! /*
! * This is a multixact which is not marked LOCK_ONLY, but which
! * is newer than the cutoff_multi. If the update_xid is below the
! * cutoff_xid point, then we can just freeze the Xmax in the
! * tuple, removing it altogether. This seems simple, but there
! * are several underlying assumptions:
! *
! * 1. A tuple marked by an multixact containing a very old
! * committed update Xid would have been pruned away by vacuum; we
! * wouldn't be freezing this tuple at all.
! *
! * 2. There cannot possibly be any live locking members remaining
! * in the multixact. This is because if they were alive, the
! * update's Xid would had been considered, via the lockers'
! * snapshot's Xmin, as part the cutoff_xid.
READ COMMITTED transactions can reset MyPgXact->xmin between commands,
defeating that assumption; see SnapshotResetXmin(). I have attached an
isolationtester spec demonstrating the problem. The test spec additionally
covers a (probably-related) assertion failure, new in 9.3.2.
! *
! * 3. We don't create new MultiXacts via MultiXactIdExpand() that
! * include a very old aborted update Xid: in that function we only
! * include update Xids corresponding to transactions that are
! * committed or in-progress.
! */
! update_xid = HeapTupleGetUpdateXid(tuple);
! if (TransactionIdPrecedes(update_xid, cutoff_xid))
! freeze_xmax = true;
That was the only concrete runtime problem I found during a study of the
newest heap_freeze_tuple() and heap_tuple_needs_freeze() code. One thing that
leaves me unsure is the fact that vacuum_set_xid_limits() does no locking to
ensure a consistent result between GetOldestXmin() and GetOldestMultiXactId().
Transactions may start or end between those calls, making the
GetOldestMultiXactId() result represent a later set of transactions than the
GetOldestXmin() result. I suspect that's fine. New transactions have no
immediate effect on either cutoff, and transaction end can only increase a
cutoff. Using a slightly-lower cutoff than the maximum safe cutoff is always
okay; consider vacuum_defer_cleanup_age.
Thanks,
nm
--
Noah Misch
EnterpriseDB http://www.enterprisedb.com
Attachments:
fk-old-updater-new-multi.spectext/plain; charset=us-asciiDownload
On 2013-12-03 00:47:07 -0500, Noah Misch wrote:
On Sat, Nov 30, 2013 at 01:06:09AM +0000, Alvaro Herrera wrote:
Fix a couple of bugs in MultiXactId freezing
Both heap_freeze_tuple() and heap_tuple_needs_freeze() neglected to look
into a multixact to check the members against cutoff_xid.! /*
! * This is a multixact which is not marked LOCK_ONLY, but which
! * is newer than the cutoff_multi. If the update_xid is below the
! * cutoff_xid point, then we can just freeze the Xmax in the
! * tuple, removing it altogether. This seems simple, but there
! * are several underlying assumptions:
! *
! * 1. A tuple marked by an multixact containing a very old
! * committed update Xid would have been pruned away by vacuum; we
! * wouldn't be freezing this tuple at all.
! *
! * 2. There cannot possibly be any live locking members remaining
! * in the multixact. This is because if they were alive, the
! * update's Xid would had been considered, via the lockers'
! * snapshot's Xmin, as part the cutoff_xid.READ COMMITTED transactions can reset MyPgXact->xmin between commands,
defeating that assumption; see SnapshotResetXmin(). I have attached an
isolationtester spec demonstrating the problem.
Any idea how to cheat our way out of that one given the current way
heap_freeze_tuple() works (running on both primary and standby)? My only
idea was to MultiXactIdWait() if !InRecovery but that's extremly grotty.
We can't even realistically create a new multixact with fewer members
with the current format of xl_heap_freeze.
The test spec additionally
covers a (probably-related) assertion failure, new in 9.3.2.
Too bad it's too late to do anthing about it for 9.3.2. :(. At least the
last seems actually unrelated, I am not sure why it's 9.3.2
only. Alvaro, are you looking?
That was the only concrete runtime problem I found during a study of the
newest heap_freeze_tuple() and heap_tuple_needs_freeze() code.
I'd even be interested in fuzzy problems ;). If 9.3. wouldn't have been
released the interactions between cutoff_xid/multi would have caused me
to say "back to the drawing" board... I'm not suprised if further things
are lurking there.
One thing that
leaves me unsure is the fact that vacuum_set_xid_limits() does no locking to
ensure a consistent result between GetOldestXmin() and GetOldestMultiXactId().
Transactions may start or end between those calls, making the
GetOldestMultiXactId() result represent a later set of transactions than the
GetOldestXmin() result. I suspect that's fine. New transactions have no
immediate effect on either cutoff, and transaction end can only increase a
cutoff. Using a slightly-lower cutoff than the maximum safe cutoff is always
okay; consider vacuum_defer_cleanup_age.
Yes, that seems fine to me, with the same reasoning.
Greetings,
Andres Freund
--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Tue, Dec 03, 2013 at 11:56:07AM +0100, Andres Freund wrote:
On 2013-12-03 00:47:07 -0500, Noah Misch wrote:
On Sat, Nov 30, 2013 at 01:06:09AM +0000, Alvaro Herrera wrote:
Fix a couple of bugs in MultiXactId freezing
Both heap_freeze_tuple() and heap_tuple_needs_freeze() neglected to look
into a multixact to check the members against cutoff_xid.! /*
! * This is a multixact which is not marked LOCK_ONLY, but which
! * is newer than the cutoff_multi. If the update_xid is below the
! * cutoff_xid point, then we can just freeze the Xmax in the
! * tuple, removing it altogether. This seems simple, but there
! * are several underlying assumptions:
! *
! * 1. A tuple marked by an multixact containing a very old
! * committed update Xid would have been pruned away by vacuum; we
! * wouldn't be freezing this tuple at all.
! *
! * 2. There cannot possibly be any live locking members remaining
! * in the multixact. This is because if they were alive, the
! * update's Xid would had been considered, via the lockers'
! * snapshot's Xmin, as part the cutoff_xid.READ COMMITTED transactions can reset MyPgXact->xmin between commands,
defeating that assumption; see SnapshotResetXmin(). I have attached an
isolationtester spec demonstrating the problem.Any idea how to cheat our way out of that one given the current way
heap_freeze_tuple() works (running on both primary and standby)? My only
idea was to MultiXactIdWait() if !InRecovery but that's extremly grotty.
We can't even realistically create a new multixact with fewer members
with the current format of xl_heap_freeze.
Perhaps set HEAP_XMAX_LOCK_ONLY on the tuple? We'd then ensure all update XID
consumers check HEAP_XMAX_IS_LOCKED_ONLY() first, much like xmax consumers are
already expected to check HEAP_XMAX_INVALID first. Seems doable, albeit yet
another injection of complexity.
The test spec additionally
covers a (probably-related) assertion failure, new in 9.3.2.Too bad it's too late to do anthing about it for 9.3.2. :(. At least the
last seems actually unrelated, I am not sure why it's 9.3.2
only. Alvaro, are you looking?
(For clarity, the other problem demonstrated by the test spec is also a 9.3.2
regression.)
That was the only concrete runtime problem I found during a study of the
newest heap_freeze_tuple() and heap_tuple_needs_freeze() code.I'd even be interested in fuzzy problems ;). If 9.3. wouldn't have been
released the interactions between cutoff_xid/multi would have caused me
to say "back to the drawing" board... I'm not suprised if further things
are lurking there.
heap_freeze_tuple() of 9.2 had an XXX comment about the possibility of getting
spurious lock contention due to wraparound of the multixact space. The
comment is gone, and that mechanism no longer poses a threat. However, a
non-wrapped multixact containing wrapped locker XIDs (we don't freeze locker
XIDs, just updater XIDs) may cause similar spurious contention.
+ /* + * The multixact has an update hidden within. Get rid of it. + * + * If the update_xid is below the cutoff_xid, it necessarily + * must be an aborted transaction. In a primary server, such + * an Xmax would have gotten marked invalid by + * HeapTupleSatisfiesVacuum, but in a replica that is not + * called before we are, so deal with it in the same way. + * + * If not below the cutoff_xid, then the tuple would have been + * pruned by vacuum, if the update committed long enough ago, + * and we wouldn't be freezing it; so it's either recently + * committed, or in-progress. Deal with this by setting the + * Xmax to the update Xid directly and remove the IS_MULTI + * bit. (We know there cannot be running lockers in this + * multi, because it's below the cutoff_multi value.) + */ + + if (TransactionIdPrecedes(update_xid, cutoff_xid)) + { + Assert(InRecovery || TransactionIdDidAbort(update_xid)); + freeze_xmax = true; + } + else + { + Assert(InRecovery || !TransactionIdIsInProgress(update_xid));
This assertion is at odds with the comment, but the assertion is okay for now.
If the updater is still in progress, its OldestMemberMXactId[] entry will have
held back cutoff_multi, and we won't be here. Therefore, if we get here, the
tuple will always be HEAPTUPLE_RECENTLY_DEAD (recently-committed updater) or
HEAPTUPLE_LIVE (aborted updater, recent or not).
Numerous comments in the vicinity (e.g. ones at MultiXactStateData) reflect a
pre-9.3 world. Most or all of that isn't new with the patch at hand, but it
does complicate study.
--
Noah Misch
EnterpriseDB http://www.enterprisedb.com
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Andres Freund <andres@2ndquadrant.com> writes:
On 2013-12-03 00:47:07 -0500, Noah Misch wrote:
The test spec additionally
covers a (probably-related) assertion failure, new in 9.3.2.
Too bad it's too late to do anthing about it for 9.3.2. :(. At least the
last seems actually unrelated, I am not sure why it's 9.3.2
only. Alvaro, are you looking?
Is this bad enough that we need to re-wrap the release?
regards, tom lane
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 2013-12-03 09:48:23 -0500, Tom Lane wrote:
Andres Freund <andres@2ndquadrant.com> writes:
On 2013-12-03 00:47:07 -0500, Noah Misch wrote:
The test spec additionally
covers a (probably-related) assertion failure, new in 9.3.2.Too bad it's too late to do anthing about it for 9.3.2. :(. At least the
last seems actually unrelated, I am not sure why it's 9.3.2
only. Alvaro, are you looking?Is this bad enough that we need to re-wrap the release?
Tentatively I'd say no, the only risk is loosing locks afaics. Thats
much bettter than corrupting rows as in 9.3.1. But I'll look into it in
a bit more detail as soon as I am of the phone call I am on.
Greetings,
Andres Freund
--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 2013-12-03 09:16:18 -0500, Noah Misch wrote:
On Tue, Dec 03, 2013 at 11:56:07AM +0100, Andres Freund wrote:
On 2013-12-03 00:47:07 -0500, Noah Misch wrote:
On Sat, Nov 30, 2013 at 01:06:09AM +0000, Alvaro Herrera wrote:
Any idea how to cheat our way out of that one given the current way
heap_freeze_tuple() works (running on both primary and standby)? My only
idea was to MultiXactIdWait() if !InRecovery but that's extremly grotty.
We can't even realistically create a new multixact with fewer members
with the current format of xl_heap_freeze.Perhaps set HEAP_XMAX_LOCK_ONLY on the tuple? We'd then ensure all update XID
consumers check HEAP_XMAX_IS_LOCKED_ONLY() first, much like xmax consumers are
already expected to check HEAP_XMAX_INVALID first. Seems doable, albeit yet
another injection of complexity.
I think its pretty much checked that way already, but the problem seems
to be how to avoid checks on xid commit/abort in that case. I've
complained in 20131121200517.GM7240@alap2.anarazel.de that the old
pre-condition that multixacts aren't checked when they can't be relevant
(via OldestVisibleM*) isn't observed anymore.
So, if we re-introduce that condition again, we should be on the safe
side with that, right?
The test spec additionally
covers a (probably-related) assertion failure, new in 9.3.2.Too bad it's too late to do anthing about it for 9.3.2. :(. At least the
last seems actually unrelated, I am not sure why it's 9.3.2
only. Alvaro, are you looking?(For clarity, the other problem demonstrated by the test spec is also a 9.3.2
regression.)
Yea, I just don't see why yet... Looking now.
heap_freeze_tuple() of 9.2 had an XXX comment about the possibility of getting
spurious lock contention due to wraparound of the multixact space. The
comment is gone, and that mechanism no longer poses a threat. However, a
non-wrapped multixact containing wrapped locker XIDs (we don't freeze locker
XIDs, just updater XIDs) may cause similar spurious contention.
Yea, I noticed that that comment was missing as well. I think what we
should do now is to rework freezing in HEAD to make all this more
reasonable.
Numerous comments in the vicinity (e.g. ones at MultiXactStateData) reflect a
pre-9.3 world. Most or all of that isn't new with the patch at hand, but it
does complicate study.
Yea, Alvaro sent a patch for that somewhere, it seems a patch in the
series got lost when foreign key locks were originally applied.
I think we seriously need to put a good amount of work into the
multixact.c stuff in the next months. Otherwise it will be a maintenance
nightmore for a fair bit more time.
Greetings,
Andres Freund
--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 2013-12-03 09:16:18 -0500, Noah Misch wrote:
The test spec additionally
covers a (probably-related) assertion failure, new in 9.3.2.Too bad it's too late to do anthing about it for 9.3.2. :(. At least the
last seems actually unrelated, I am not sure why it's 9.3.2
only. Alvaro, are you looking?(For clarity, the other problem demonstrated by the test spec is also a 9.3.2
regression.)
The backtrace for the Assert() you found is:
#4 0x00000000004f1da5 in CreateMultiXactId (nmembers=2, members=0x1ce17d8)
at /home/andres/src/postgresql/src/backend/access/transam/multixact.c:708
#5 0x00000000004f1aeb in MultiXactIdExpand (multi=6241831, xid=6019366, status=MultiXactStatusUpdate)
at /home/andres/src/postgresql/src/backend/access/transam/multixact.c:462
#6 0x00000000004a5d8e in compute_new_xmax_infomask (xmax=6241831, old_infomask=4416, old_infomask2=16386, add_to_xmax=6019366,
mode=LockTupleExclusive, is_update=1 '\001', result_xmax=0x7fffca02a700, result_infomask=0x7fffca02a6fe,
result_infomask2=0x7fffca02a6fc) at /home/andres/src/postgresql/src/backend/access/heap/heapam.c:4651
#7 0x00000000004a2d27 in heap_update (relation=0x7f9fc45cc828, otid=0x7fffca02a8d0, newtup=0x1ce1740, cid=0, crosscheck=0x0,
wait=1 '\001', hufd=0x7fffca02a850, lockmode=0x7fffca02a82c) at /home/andres/src/postgresql/src/backend/access/heap/heapam.c:3304
#8 0x0000000000646f04 in ExecUpdate (tupleid=0x7fffca02a8d0, oldtuple=0x0, slot=0x1ce12c0, planSlot=0x1ce0740, epqstate=0x1ce0120,
estate=0x1cdfe98, canSetTag=1 '\001') at /home/andres/src/postgresql/src/backend/executor/nodeModifyTable.c:690
So imo it isn't really a new problem, it existed all along :/. We only
don't hit it in your terstcase before because we spuriously thought that
a tuple was in-progress if *any* member of the old multi were still
running in some cases instead of just the updater. But I am pretty sure
it can also reproduced in 9.3.1.
Imo the MultiXactIdSetOldestMember() call in heap_update() needs to be
moved outside of the if (satisfies_key). Everything else is vastly more
complex.
Alvaro, correct?
Greetings,
Andres Freund
--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Tue, Dec 03, 2013 at 04:08:23PM +0100, Andres Freund wrote:
On 2013-12-03 09:16:18 -0500, Noah Misch wrote:
On Tue, Dec 03, 2013 at 11:56:07AM +0100, Andres Freund wrote:
On 2013-12-03 00:47:07 -0500, Noah Misch wrote:
On Sat, Nov 30, 2013 at 01:06:09AM +0000, Alvaro Herrera wrote:
Any idea how to cheat our way out of that one given the current way
heap_freeze_tuple() works (running on both primary and standby)? My only
idea was to MultiXactIdWait() if !InRecovery but that's extremly grotty.
We can't even realistically create a new multixact with fewer members
with the current format of xl_heap_freeze.Perhaps set HEAP_XMAX_LOCK_ONLY on the tuple? We'd then ensure all update XID
consumers check HEAP_XMAX_IS_LOCKED_ONLY() first, much like xmax consumers are
already expected to check HEAP_XMAX_INVALID first. Seems doable, albeit yet
another injection of complexity.I think its pretty much checked that way already, but the problem seems
to be how to avoid checks on xid commit/abort in that case. I've
complained in 20131121200517.GM7240@alap2.anarazel.de that the old
pre-condition that multixacts aren't checked when they can't be relevant
(via OldestVisibleM*) isn't observed anymore.
So, if we re-introduce that condition again, we should be on the safe
side with that, right?
What specific commit/abort checks do you have in mind?
The test spec additionally
covers a (probably-related) assertion failure, new in 9.3.2.Too bad it's too late to do anthing about it for 9.3.2. :(. At least the
last seems actually unrelated, I am not sure why it's 9.3.2
only. Alvaro, are you looking?(For clarity, the other problem demonstrated by the test spec is also a 9.3.2
regression.)Yea, I just don't see why yet... Looking now.
Sorry, my original report was rather terse. I speak of the scenario exercised
by the second permutation in that isolationtester spec. The multixact is
later than VACUUM's cutoff_multi, so 9.3.1 does not freeze it at all. 9.3.2
does freeze it to InvalidTransactionId per the code I cited in my first
response on this thread, which wrongly removes a key lock.
--
Noah Misch
EnterpriseDB http://www.enterprisedb.com
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 2013-12-03 10:29:54 -0500, Noah Misch wrote:
Sorry, my original report was rather terse. I speak of the scenario exercised
by the second permutation in that isolationtester spec. The multixact is
later than VACUUM's cutoff_multi, so 9.3.1 does not freeze it at all. 9.3.2
does freeze it to InvalidTransactionId per the code I cited in my first
response on this thread, which wrongly removes a key lock.
That one is clear, I was only confused about the Assert() you
reported. But I think I've explained that elsewhere.
I currently don't see fixing the errorneous freezing of lockers (not the
updater though) without changing the wal format or synchronously waiting
for all lockers to end. Which both see like a no-go?
While it's still a major bug it seems to still be much better than the
previous case of either inaccessible or reappearing rows.
Greetings,
Andres Freund
--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Hi,
On 2013-12-03 10:29:54 -0500, Noah Misch wrote:
On Tue, Dec 03, 2013 at 04:08:23PM +0100, Andres Freund wrote:
On 2013-12-03 09:16:18 -0500, Noah Misch wrote:
On Tue, Dec 03, 2013 at 11:56:07AM +0100, Andres Freund wrote:
On 2013-12-03 00:47:07 -0500, Noah Misch wrote:
On Sat, Nov 30, 2013 at 01:06:09AM +0000, Alvaro Herrera wrote:
Any idea how to cheat our way out of that one given the current way
heap_freeze_tuple() works (running on both primary and standby)? My only
idea was to MultiXactIdWait() if !InRecovery but that's extremly grotty.
We can't even realistically create a new multixact with fewer members
with the current format of xl_heap_freeze.Perhaps set HEAP_XMAX_LOCK_ONLY on the tuple? We'd then ensure all update XID
consumers check HEAP_XMAX_IS_LOCKED_ONLY() first, much like xmax consumers are
already expected to check HEAP_XMAX_INVALID first. Seems doable, albeit yet
another injection of complexity.I think its pretty much checked that way already, but the problem seems
to be how to avoid checks on xid commit/abort in that case. I've
complained in 20131121200517.GM7240@alap2.anarazel.de that the old
pre-condition that multixacts aren't checked when they can't be relevant
(via OldestVisibleM*) isn't observed anymore.
So, if we re-introduce that condition again, we should be on the safe
side with that, right?What specific commit/abort checks do you have in mind?
MultiXactIdIsRunning() does a TransactionIdIsInProgress() for each
member which in turn does TransactionIdDidCommit(). Similar when locking
a tuple that's locked/updated without a multixact where we go for a
TransactionIdIsInProgress() in XactLockTableWait().
Greetings,
Andres Freund
--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Hi Alvaro, Noah,
On 2013-12-03 15:57:10 +0100, Andres Freund wrote:
On 2013-12-03 09:48:23 -0500, Tom Lane wrote:
Andres Freund <andres@2ndquadrant.com> writes:
On 2013-12-03 00:47:07 -0500, Noah Misch wrote:
The test spec additionally
covers a (probably-related) assertion failure, new in 9.3.2.Too bad it's too late to do anthing about it for 9.3.2. :(. At least the
last seems actually unrelated, I am not sure why it's 9.3.2
only. Alvaro, are you looking?Is this bad enough that we need to re-wrap the release?
Tentatively I'd say no, the only risk is loosing locks afaics. Thats
much bettter than corrupting rows as in 9.3.1. But I'll look into it in
a bit more detail as soon as I am of the phone call I am on.
After looking, I think I am revising my opinion. The broken locking
behaviour (outside of freeze, which I don't see how we can fix in time),
is actually bad.
Would that stop us from making the release date with packages?
Greetings,
Andres Freund
--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Andres Freund <andres@2ndquadrant.com> writes:
On 2013-12-03 09:48:23 -0500, Tom Lane wrote:
Is this bad enough that we need to re-wrap the release?
After looking, I think I am revising my opinion. The broken locking
behaviour (outside of freeze, which I don't see how we can fix in time),
is actually bad.
Would that stop us from making the release date with packages?
That's hardly answerable when you haven't specified how long you
think it'd take to fix.
In general, though, I'm going to be exceedingly unhappy if this release
introduces new regressions. If we have to put off the release to fix
something, maybe we'd better do so. And we'd damn well better get it
right this time.
regards, tom lane
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 2013-12-03 12:22:33 -0500, Tom Lane wrote:
Andres Freund <andres@2ndquadrant.com> writes:
On 2013-12-03 09:48:23 -0500, Tom Lane wrote:
Is this bad enough that we need to re-wrap the release?
After looking, I think I am revising my opinion. The broken locking
behaviour (outside of freeze, which I don't see how we can fix in time),
is actually bad.
Would that stop us from making the release date with packages?That's hardly answerable when you haven't specified how long you
think it'd take to fix.
There's two things that are broken as-is:
1) the freezing of multixacts: The new state is much better than the old
one because the old one corrupted data, while the new one somes removes
locks when you explicitly FREEZE.
2) The broken locking behaviour in Noah's test without the
FREEZE.
I don't see a realistic chance of fixing 1) in 9.3. Not even sure if it
can be done without changing the freeze WAL format. But 2) should be fixed
and basically is a oneliner + comments + test. Alvaro?
In general, though, I'm going to be exceedingly unhappy if this release
introduces new regressions. If we have to put off the release to fix
something, maybe we'd better do so. And we'd damn well better get it
right this time.
I think that's really hard for the multixacts stuff. There's lots of
stuff not really okay in there, but we can't do much about that now :(
Greetings,
Andres Freund
--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Andres Freund <andres@2ndquadrant.com> writes:
Any idea how to cheat our way out of that one given the current way
heap_freeze_tuple() works (running on both primary and standby)? My only
idea was to MultiXactIdWait() if !InRecovery but that's extremly grotty.
We can't even realistically create a new multixact with fewer members
with the current format of xl_heap_freeze.
Maybe we should just bite the bullet and change the WAL format for
heap_freeze (inventing an all-new record type, not repurposing the old
one, and allowing WAL replay to continue to accept the old one). The
implication for users would be that they'd have to update slave servers
before the master when installing the update; which is unpleasant, but
better than living with a known data corruption case.
regards, tom lane
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Tue, Dec 3, 2013 at 7:11 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Andres Freund <andres@2ndquadrant.com> writes:
Any idea how to cheat our way out of that one given the current way
heap_freeze_tuple() works (running on both primary and standby)? My only
idea was to MultiXactIdWait() if !InRecovery but that's extremly grotty.
We can't even realistically create a new multixact with fewer members
with the current format of xl_heap_freeze.Maybe we should just bite the bullet and change the WAL format for
heap_freeze (inventing an all-new record type, not repurposing the old
one, and allowing WAL replay to continue to accept the old one). The
implication for users would be that they'd have to update slave servers
before the master when installing the update; which is unpleasant, but
better than living with a known data corruption case.
Agreed. It may suck, but it sucks less.
How badly will it break if they do the upgrade in the wrong order though.
Will the slaves just stop (I assume this?) or is there a risk of a
wrong-order upgrade causing extra breakage? And if they do shut down, would
just upgrading the slave fix it, or would they then have to rebuild the
slave? (actually, don't we recommend they always rebuild the slave
*anyway*? In which case the problem is even smaller..)
I think we've always told people to upgrade the slave first, and it's the
logical thing that AFAIK most other systems require as well, so that's not
an unreasonable requirement at all.
I assume we'd then get rid of the old record type completely in 9.4, right?
--
Magnus Hagander
Me: http://www.hagander.net/
Work: http://www.redpill-linpro.com/
On Tue, Dec 03, 2013 at 04:37:58PM +0100, Andres Freund wrote:
On 2013-12-03 10:29:54 -0500, Noah Misch wrote:
Sorry, my original report was rather terse. I speak of the scenario exercised
by the second permutation in that isolationtester spec. The multixact is
later than VACUUM's cutoff_multi, so 9.3.1 does not freeze it at all. 9.3.2
does freeze it to InvalidTransactionId per the code I cited in my first
response on this thread, which wrongly removes a key lock.That one is clear, I was only confused about the Assert() you
reported. But I think I've explained that elsewhere.I currently don't see fixing the errorneous freezing of lockers (not the
updater though) without changing the wal format or synchronously waiting
for all lockers to end. Which both see like a no-go?
Not fixing it at all is the real no-go. We'd take both of those undesirables
before just tolerating the lost locks in 9.3.
The attached patch illustrates the approach I was describing earlier. It
fixes the test case discussed above. I haven't verified that everything else
in the system is ready for it, so this is just for illustration purposes.
--
Noah Misch
EnterpriseDB http://www.enterprisedb.com
Attachments:
freeze-multi-lockonly-v1.patchtext/plain; charset=us-asciiDownload+32-32
Tom Lane wrote:
Andres Freund <andres@2ndquadrant.com> writes:
Any idea how to cheat our way out of that one given the current way
heap_freeze_tuple() works (running on both primary and standby)? My only
idea was to MultiXactIdWait() if !InRecovery but that's extremly grotty.
We can't even realistically create a new multixact with fewer members
with the current format of xl_heap_freeze.Maybe we should just bite the bullet and change the WAL format for
heap_freeze (inventing an all-new record type, not repurposing the old
one, and allowing WAL replay to continue to accept the old one). The
implication for users would be that they'd have to update slave servers
before the master when installing the update; which is unpleasant, but
better than living with a known data corruption case.
That was my suggestion too (modulo, I admit, the bit about it being a
new, separate record type.)
--
�lvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Magnus Hagander <magnus@hagander.net> writes:
On Tue, Dec 3, 2013 at 7:11 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Maybe we should just bite the bullet and change the WAL format for
heap_freeze (inventing an all-new record type, not repurposing the old
one, and allowing WAL replay to continue to accept the old one). The
implication for users would be that they'd have to update slave servers
before the master when installing the update; which is unpleasant, but
better than living with a known data corruption case.
Agreed. It may suck, but it sucks less.
How badly will it break if they do the upgrade in the wrong order though.
Will the slaves just stop (I assume this?) or is there a risk of a
wrong-order upgrade causing extra breakage?
I assume what would happen is the slave would PANIC upon seeing a WAL
record code it didn't recognize. Installing the updated version should
allow it to resume functioning. Would be good to test this, but if it
doesn't work like that, that'd be another bug to fix IMO. We've always
foreseen the possible need to do something like this, so it ought to
work reasonably cleanly.
I assume we'd then get rid of the old record type completely in 9.4, right?
Yeah, we should be able to drop it in 9.4, since we'll surely have other
WAL format changes anyway.
regards, tom lane
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Tue, Dec 3, 2013 at 7:20 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Magnus Hagander <magnus@hagander.net> writes:
On Tue, Dec 3, 2013 at 7:11 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Maybe we should just bite the bullet and change the WAL format for
heap_freeze (inventing an all-new record type, not repurposing the old
one, and allowing WAL replay to continue to accept the old one). The
implication for users would be that they'd have to update slave servers
before the master when installing the update; which is unpleasant, but
better than living with a known data corruption case.Agreed. It may suck, but it sucks less.
How badly will it break if they do the upgrade in the wrong order though.
Will the slaves just stop (I assume this?) or is there a risk of a
wrong-order upgrade causing extra breakage?I assume what would happen is the slave would PANIC upon seeing a WAL
record code it didn't recognize. Installing the updated version should
allow it to resume functioning. Would be good to test this, but if it
doesn't work like that, that'd be another bug to fix IMO. We've always
foreseen the possible need to do something like this, so it ought to
work reasonably cleanly.
Yeah, as long as that's tested and actually works, that sounds like an
acceptable thing to deal with.
I assume we'd then get rid of the old record type completely in 9.4,
right?
Yeah, we should be able to drop it in 9.4, since we'll surely have other
WAL format changes anyway.
And even if not, there's no point in keeping it unless we actually support
replication from 9.3 -> 9.4, I think, and I don't believe anybody has even
considered working on that yet :)
--
Magnus Hagander
Me: http://www.hagander.net/
Work: http://www.redpill-linpro.com/