BUG #12990: Missing pg_multixact/members files (appears to have wrapped, then truncated)
The following bug has been logged on the website:
Bug reference: 12990
Logged by: Timothy Garnett
Email address: tgarnett@panjiva.com
PostgreSQL version: 9.3.5
Operating system: Ubuntu Linux x86_64 12.04.5 LTS
Description:
At 4/1 5:10pm our production database started throwing these kinds of errors
(all of our hot spares were also similarly corrupted):
ERROR: could not access status of transaction 303450738
DETAIL: Could not open file "pg_multixact/members/7B49": No such file or
directory.
This is possibly related to bug 8673 (though the mailing list sounds like
most issues there were resolved by 9.3.4 and we are on 9.3.5) and also of
note this db cluster was pg_upgraded from 9.2 at one point (to 9.3.x then
further upgraded to 9.3.5)(see
/messages/by-id/CAAS3ty+2ynCyf_YmRn6WuqSF8EmJMDypAkc7uD_EXTZJ7usOSg@mail.gmail.com
, though it doesn't seem like that's involved here).
We have a file-system level snapshot of a hot spare of the db from about 28
hours before the errors started and sufficient wal files to do point in time
recovery. We recovered a copy of the database to about 1 hour before the
errors started and promoted that to our production db (losing a couple of
hours of commits).
Walking through the point in time recovery we see the pg_multixact/members
folder fill up and eventually suffer a massive truncation when the newest
file overtakes the oldest file. The range from oldest to newest (through the
wrap point 14078) appears to be continuous.
Time Newest-File Oldest-File
1 am 72BA 98A4(last Oct.)
10am 72BC 98A4
11am 72BD 98A4
12pm 7C4E 98A4
1 pm 7E88 98A4
2 pm 7FFF 98A4
3 pm 884E 98A4
4 pm 905F 98A4 * here we forked our new production
4:30 94AF 98D2
5pm 984D 98D2
5:15 98D2 9900 * now errors, all files outside of 98D2-9900 are gone
During the rapid growth we were generating 10-12 pg_multixact/members files
per minute.
After we forked off a new production we identified the workload that was
causing this rapid growth and stopped it. We then used vacuumdb -a -F to
vacuum freeze the whole database cluster. Our presumption was that this
would free up the members files, but it only freed up a few. The current
oldest file is 9E30 (~Jan. 1st) so moved up from 98A4 and the newest file is
906A. We're concerned this might be a time-bomb waiting for us in the
future (though at the slower growth rate of 16 files a day or so potentially
a while in the future) as the members file namespace is still 95+% consumed
post vacuum freeze. We do plan to upgrade to 9.4 sometime in the next
couple of months and are curious if we can use pg_upgrade or if we will need
to dump / restore the full (multi TiB) cluster.
As for the workload causing the rapid growth, it involved something like:
while [many millions of things to update / insert]
BEGIN;
SELECT state FROM table_A WHERE id = 1 FOR SHARE;
if state != 'blocked'
update / insert 2000-10000 rows in table_B, other stuff
COMMIT;
else
COMMIT;
sleep wait
being run in a bunch of connections (26) to the db. Row 1 of table A was
being used effectively as a share / exclusive lock as another different
process would update the state to 'blocked' to block the first process in
order to manage some shared outside the db state.
We've retained the pre-failure snapshot and wal files for now (though we
will need to free them up at some point) so we can point in time recover to
any point from 28 hours before to several hours after the problem surfaced
if that's helpful at all.
Bugs / Questions:
- the members files wrapped over themselves leading to corruption
- why didn't vacuum_db -a -F free up more members files?
--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs
"tgarnett@panjiva.com" <tgarnett@panjiva.com> wrote:
- why didn't vacuum_db -a -F free up more members files?
Are you sure that while the problem was developing the primary
cluster didn't have any long-running transactions (including those
left "idle in transaction" or prepared transactions)? Was there a
checkpoint following the vacuumdb -a -F run?
--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs
- why didn't vacuum_db -a -F free up more members files?
Are you sure that while the problem was developing the primary
cluster didn't have any long-running transactions (including those
left "idle in transaction" or prepared transactions)? Was there a
checkpoint following the vacuumdb -a -F run?
The vacuum_db was run after we recovered from backup (which involved
restarting the server) so at the time the vacuum_db started there were no
open transactions. There have have been checkpoints since the vacuum_db
finished as well.
In the lead up to the problem there wouldn't have been any transactions
open more then a couple of hours (the oldest members file was over 6 months
old).
Tim
Timothy Garnett <tgarnett@panjiva.com> wrote:
- why didn't vacuum_db -a -F free up more members files?
Are you sure that while the problem was developing the primary
cluster didn't have any long-running transactions (including
those left "idle in transaction" or prepared transactions)? Was
there a checkpoint following the vacuumdb -a -F run?The vacuum_db was run after we recovered from backup (which
involved restarting the server) so at the time the vacuum_db
started there were no open transactions. There have have been
checkpoints since the vacuum_db finished as well.In the lead up to the problem there wouldn't have been any
transactions open more then a couple of hours (the oldest members
file was over 6 months old).
Thanks; that helps narrow where we need to look for the bug. Just
to be sure, what are the results of?:
SHOW max_prepared_transactions;
If that is non-zero, do you have any monitoring for rows in the
pg_prepared_xacts view with a "prepared" value older than some
reasonable threshold?
--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs
Thanks; that helps narrow where we need to look for the bug. Just
to be sure, what are the results of?:SHOW max_prepared_transactions;
We don't use prepared transactions so that's prob. not involved.
SHOW max_prepared_transactions;
max_prepared_transactions
---------------------------
0
(1 row)
Tim
On Mon, Apr 6, 2015 at 07:21:30PM +0000, tgarnett@panjiva.com wrote:
We've retained the pre-failure snapshot and wal files for now (though we
will need to free them up at some point) so we can point in time recover to
any point from 28 hours before to several hours after the problem surfaced
if that's helpful at all.Bugs / Questions:
- the members files wrapped over themselves leading to corruption
- why didn't vacuum_db -a -F free up more members files?
Wow, you did a lot of research on this and I am glad you seem to have
found a solution. Frankly, there were so many multi-xact bugs in 9.3.X
that I am unable to track exactly what bugs caused what failures, when
they were fixed, and whether fixes fixed the problem for good, or
whether they just fixed them from happening in the future. Yes, very
discouraging.
--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com
+ Everyone has their own god. +
--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs
tgarnett@panjiva.com wrote:
ERROR: could not access status of transaction 303450738
DETAIL: Could not open file "pg_multixact/members/7B49": No such file or
directory.
Bruce and Kevin both pinged me about this issue recently. Turns out
that I have an incomplete patch to close the problem. Just to clarify,
this is a completely new problem, not related to #8673.
The fix is to raise an ERROR when generating a new multixact, if we
detect that doing so would get close to the oldest multixact that the
system knows about. If that happens, the solution is to vacuum so that
the "oldest" point is advanced a bit more and you have room to generate
more multixacts. In production, you would typically adjust the
multixact freeze parameters so that "oldest multixact" is advanced more
aggressively and you don't hit the ERROR.
A fix I pushed to master (for a performance regression reportes as bug
#8470) would make the problem less common, by having typical multixact
sizes be smaller. I didn't backpatch that fix due to lack of feedback,
but since it is connected to this data-eating bug, maybe we should look
into doing so. This problem only arises if your multixacts are larger
than 24 members in average (or something like that. I don't recall the
exact number.) That should be atypical, except that prior to the #8470
fix the multixact size is related to number of subtransactions doing
certain operations in a loop.
--
�lvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs
On Mon, Apr 13, 2015 at 5:08 PM, Alvaro Herrera
<alvherre@2ndquadrant.com> wrote:
tgarnett@panjiva.com wrote:
ERROR: could not access status of transaction 303450738
DETAIL: Could not open file "pg_multixact/members/7B49": No such file or
directory.Bruce and Kevin both pinged me about this issue recently. Turns out
that I have an incomplete patch to close the problem. Just to clarify,
this is a completely new problem, not related to #8673.The fix is to raise an ERROR when generating a new multixact, if we
detect that doing so would get close to the oldest multixact that the
system knows about.
I think we definitely need to do that ASAP. And possibly then force
an immediate minor release. Bugs that eat your data are bad, and we
have a customer hitting this completely independently of this report,
which makes this look like more than a theoretical problem.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs
On Wed, Apr 15, 2015 at 12:04 PM, Robert Haas <robertmhaas@gmail.com> wrote:
I think we definitely need to do that ASAP. And possibly then force
an immediate minor release. Bugs that eat your data are bad, and we
have a customer hitting this completely independently of this report,
which makes this look like more than a theoretical problem.
I defer to others on what a good timeline would be, but failing harder here
would be good. We were somewhat lucky in discovering the issue as fast as
we did. Post wrap, 99.9+% of the queries to our database were returning
fine without error. We only observed the failure when accessing two
specific rows in two different tables (there were possibly more, it's a
multi-TiB db cluster). Had we not hit those rows for a few days instead of
a few hours recovery would have been extremely difficult as rolling back to
a known good state wouldn't have really been an option.
Tim
Alvaro Herrera wrote:
The fix is to raise an ERROR when generating a new multixact, if we
detect that doing so would get close to the oldest multixact that the
system knows about. If that happens, the solution is to vacuum so that
the "oldest" point is advanced a bit more and you have room to generate
more multixacts. In production, you would typically adjust the
multixact freeze parameters so that "oldest multixact" is advanced more
aggressively and you don't hit the ERROR.
Here's a patch. I have tested locally and it closes the issue for me.
If those affected can confirm that it stops the file removal from
happening, I'd appreciate it.
It would be much better to avoid that additional file reading, but it
seems difficult (read: I don't see how) without changing pg_control.
Note: in bootstrap.c, I had to move setting the bootstrap a bit earlier.
Otherwise, the is-in-bootstrap mode test returned false while
bootstrapping multixact, so initdb would fail on the initial phase
because of trying to read pg_multixact/offset/0000 which hasn't been
created at that point. The amount of code that runs in the bootstrap
mode after this change that wasn't running in bootstrap mode previously
is pretty tiny and shouldn't cause any problem -- it's essentially
the whole of BootStrapXLOG().
--
�lvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachments:
memberswrap.patchtext/x-diff; charset=us-asciiDownload+117-25
Hi Alvaro
On Tue, Apr 21, 2015 at 7:04 AM, Alvaro Herrera
<alvherre@2ndquadrant.com> wrote:
Here's a patch. I have tested locally and it closes the issue for me.
If those affected can confirm that it stops the file removal from
happening, I'd appreciate it.
I was also starting to look at this problem. For what it's worth,
here's a client program that I used to generate a lot of multixact
members. The patch seems to work correctly so far: as the offset
approached wraparound, I saw the warnings first with appropriate OID
and members remaining, and then I was blocked from creating new
multixacts.
Best regards,
--
Thomas Munro
http://www.enterprisedb.com
Attachments:
explode_mxact_members.ctext/x-csrc; charset=US-ASCII; name=explode_mxact_members.cDownload
On Tue, Apr 21, 2015 at 12:34 AM, Alvaro Herrera <alvherre@2ndquadrant.com>
wrote:
Alvaro Herrera wrote:
The fix is to raise an ERROR when generating a new multixact, if we
detect that doing so would get close to the oldest multixact that the
system knows about. If that happens, the solution is to vacuum so that
the "oldest" point is advanced a bit more and you have room to generate
more multixacts. In production, you would typically adjust the
multixact freeze parameters so that "oldest multixact" is advanced more
aggressively and you don't hit the ERROR.Here's a patch. I have tested locally and it closes the issue for me.
If those affected can confirm that it stops the file removal from
happening, I'd appreciate it.
1. Do you think it makes sense to give warning in SetMultiXactIdLimit()
if we have already reached offsetWarnLimit as we give for multiWarnLimit?
2.
void
MultiXactAdvanceOldest(MultiXactId oldestMulti, Oid oldestMultiDB)
{
if (MultiXactIdPrecedes(MultiXactState->oldestMultiXactId, oldestMulti))
SetMultiXactIdLimit(oldestMulti, oldestMultiDB);
+ else
+ DetermineSafeOldestOffset(oldestMulti);
}
Why we need to update offset stop/warn limits for the above case?
Won't it make the warning/error (pertaining to wrap around data loss)
to appear before it is required.
3.
@@ -1921,6 +1957,12 @@ StartupMultiXact(void)
*/
pageno = MXOffsetToMemberPage(offset);
MultiXactMemberCtl->shared->latest_page_number = pageno;
+
+ /*
+ * compute the oldest member we need to keep around to avoid old member
+ * data overrun.
+ */
+ DetermineSafeOldestOffset(MultiXactState->oldestMultiXactId);
}
Do we need to try determining safeoldestoffset during startup considering
that we don't allow it in recovery (StartupMultiXact() seems to be called
only during recovery)
4.
AuxiliaryProcessMain()
{
..
/*
* XLOG operations
*/
SetProcessingMode(NormalProcessing);
switch (MyAuxProcType)
{
case CheckerProcess:
/* don't set signals, they're useless here */
CheckerModeMain();
proc_exit(1); /* should never return */
case BootstrapProcess:
SetProcessingMode(BootstrapProcessing);
bootstrap_signals();
BootStrapXLOG();
BootstrapModeMain();
proc_exit(1); /* should never return */
..
}
Looking at above code, the new setting of processing mode for
BootstrapProcessing looks slightly unlear, basically first we set
processing mode as Normal and then set it to BootstrapProcessing,
may be we can add a comment there.
This solution seems like a good workaround for the problem reported,
however ideally there should be some way (via Vacuum/
Checkpoint) with which this increase of files can be prevented. I think
after your patch gets committed, we can try to devise a better design
to overcome this problem.
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com
Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
Here's a patch. I have tested locally and it closes the issue
for me. If those affected can confirm that it stops the file
removal from happening, I'd appreciate it.
Based on initial testing, it seems to stop file removal from
happening rather too well. Before applying the patch I ran a test
test that generated files 0000 to 1185D in the members directory.
Even setting vacuum_multixact_freeze_min_age and
vacuum_multixact_freeze_table_age very low, none of the members
files would go away with VACUUM (with and without FREEZE) followed
by CHECKPOINT. After applying the patch and starting with a fresh
initdb, with very low settings of the vacuum_multixact_* GUCs I get
the new error almost immediately, while the only file in the
members subdirectory is 0000 and it is 8kB in size.
I think the calculations might be wrong, but I'm not sure what does
make sense.
--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs
On Tue, Apr 21, 2015 at 12:25 PM, Thomas Munro
<thomas.munro@enterprisedb.com> wrote:
Hi Alvaro
On Tue, Apr 21, 2015 at 7:04 AM, Alvaro Herrera
<alvherre@2ndquadrant.com> wrote:Here's a patch. I have tested locally and it closes the issue for me.
If those affected can confirm that it stops the file removal from
happening, I'd appreciate it.I was also starting to look at this problem. For what it's worth,
here's a client program that I used to generate a lot of multixact
members. The patch seems to work correctly so far: as the offset
approached wraparound, I saw the warnings first with appropriate OID
and members remaining, and then I was blocked from creating new
multixacts.
One thing I noticed about your patch is that it effectively halves the
amount of multixact members you can have on disk. Sure, I'd rather
hit an error at 2^31 members than a corrupt database at 2^32 members,
but I wondered if we should try to allow the full range to be used.
I'm not sure whether there is a valid use case for such massive
amounts of pg_multixact/members data (or at least one that won't go
away if autovacuum heuristics are changed in a later patch, also I
understand that there are other recent patches that reduce member
traffic), but I if the plan is to backpatch this patch then I suppose
it should ideally not halve the amount of an important resource you
can use in existing system when people do a point upgrade.
Here's a small patch (that applies after your patch) to show how this
could be done, using three-way comparisons with an explicit boundary
to detect wraparound. There may be other technical problems (for
example MultiXactAdvanceNextMXact still uses the
MultiXactOffsetPrecedes), or this may be a bad idea just because it
breaks with the well convention for wrap around detection established
by xids.
Also, I wanted to make sure I could reproduce the original
bug/corruption in unpatched master with the client program I posted.
Here are my notes on doing that (sorry if they belabour the obvious,
partly this is just me learning how SLRUs and multixacts work...):
========
Member wraparound happens after segment file "14078" (assuming default
page size, you get 32 pages per segment, and 1636 members per page
(409 groups of 4 + some extra data), and our max member offset wraps
after 0xffffffff, and 0xffffffff / 1636 / 32 = 82040 = 0x14078;
incidentally that final segment is a shorter one).
Using my test client with 500 sessions and 35k loops I observed this,
it wrapped back around to writing to member file "0000" after creating
"14078", which is obviously broken, because the start of member
segment "0000" holds members for multixact ID 1, which was still in
play (it was datminmxid for template0).
Looking at the members of multixact ID 1 I see recent xids:
postgres=# select pg_get_multixact_members('1'::xid);
pg_get_multixact_members
--------------------------
(34238661,sh)
(34238662,sh)
(2 rows)
Note that pg_get_multixact_members knows the correct *number* of
members for multixact ID 1, it's just that it's looking at members
from some much later multixact. By a tedious binary search I found
it:
postgres=# select pg_get_multixact_members('17094780'::xid);
pg_get_multixact_members
--------------------------
... snip ...
(34238660,sh)
(34238661,sh) <-- here they are!
(34238662,sh) <--
(34238663,sh)
... snip ...
After a checkpoint, I saw that all the files got deleted except for a
few consecutively named files starting at "0000", which would be
correct behavior in general, if we hadn't allowed the member offset to
wrap. It had correctly kept the segments starting with the one
holding the members of multixact ID 1 (the cluster-wide oldest) up
until the one that corresponds to MultiXactState->nextOffset. My test
program had blown right past member offset 0xffffffff and back to 0
and then kept going. The truncation code isn't the problem per se.
To produce the specific error message seen by the bug reporter via
normal interactions from a test program, I think we need some magic
that I can't figure out how to do yet: we need to run a query that
accesses a multixact that has member offset from before offset
wraparound, eg 0xffffffff or similar, but whose members are not on a
page that is still in memory, after a checkpoint that has unlinked the
segment file, so it can try to load it and discover that the segment
file is missing! So a pretty complex interaction of concurrent
processes, timing and caches.
We can more artificially stimulate the error by explicitly asking for
multixact members like this though:
postgres=# select pg_get_multixact_members('10000000'::xid);
ERROR: could not access status of transaction 10000000
DETAIL: Could not open file "pg_multixact/members/BB55": No such file
or directory.
That's a totally valid multixact ID, obviously since it's been able to
figure out which segment to look in for its members.
Here's one that tries to open the segment that comes immediately
before "0000" in modulo numbering:
postgres=# select pg_get_multixact_members('17094770'::xid);
ERROR: could not access status of transaction 17094770
DETAIL: Could not open file "pg_multixact/members/14078": No such
file or directory.
If I tried it with 17094779, the multixact ID immediatly before the
one that has overwritten "0000", it does actually work, presumably
because its pages happen to be buffered for me so it doesn't try to
open the file (guessing here).
I don't currently believe it's necessary to reproduce that step via a
test program anyway, the root problem is clear enough just from
watching the thing wrap.
--
Thomas Munro
http://www.enterprisedb.com
Attachments:
memberswrap-full-range.patchapplication/octet-stream; name=memberswrap-full-range.patchDownload+33-2
On Tue, Apr 21, 2015 at 5:16 PM, Kevin Grittner <kgrittn@ymail.com> wrote:
Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
Here's a patch. I have tested locally and it closes the issue
for me. If those affected can confirm that it stops the file
removal from happening, I'd appreciate it.Based on initial testing, it seems to stop file removal from
happening rather too well. Before applying the patch I ran a test
test that generated files 0000 to 1185D in the members directory.
Even setting vacuum_multixact_freeze_min_age and
vacuum_multixact_freeze_table_age very low, none of the members
files would go away with VACUUM (with and without FREEZE) followed
by CHECKPOINT. After applying the patch and starting with a fresh
initdb, with very low settings of the vacuum_multixact_* GUCs I get
the new error almost immediately, while the only file in the
members subdirectory is 0000 and it is 8kB in size.I think the calculations might be wrong, but I'm not sure what does
make sense.
Can anyone else reproduce this?
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs
On Fri, Apr 24, 2015 at 5:18 AM, Robert Haas <robertmhaas@gmail.com> wrote:
On Tue, Apr 21, 2015 at 5:16 PM, Kevin Grittner <kgrittn@ymail.com> wrote:
Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
Here's a patch. I have tested locally and it closes the issue
for me. If those affected can confirm that it stops the file
removal from happening, I'd appreciate it.Based on initial testing, it seems to stop file removal from
happening rather too well. Before applying the patch I ran a test
test that generated files 0000 to 1185D in the members directory.
Even setting vacuum_multixact_freeze_min_age and
vacuum_multixact_freeze_table_age very low, none of the members
files would go away with VACUUM (with and without FREEZE) followed
by CHECKPOINT. After applying the patch and starting with a fresh
initdb, with very low settings of the vacuum_multixact_* GUCs I get
the new error almost immediately, while the only file in the
members subdirectory is 0000 and it is 8kB in size.I think the calculations might be wrong, but I'm not sure what does
make sense.Can anyone else reproduce this?
Yes. This happens in a fresh initdb-ed database. We start out with
oldestOffset = 0, oldestOffsetStopLimit = 4294914944 (that's 0 -
safety margin), nextOffset = 0. Then the first attempt to generate a
new multixact ID fails, because
MultiXactOffsetPrecedes(oldestOffsetStopLimit, nextOffset + nmembers)
is true. I guess the patch works correctly when you start out with an
unpatched database server and generate more than 2.whatever billion
members, and then you restart with the patch, and then start adding
more members until MultiXactOffsetPrecedes(...) returns true and you
get the error.
That's why I proposed not using xid-like logic, and instead using a
type of three-way comparison that allows you to see when nextOffset
would 'cross' oldestOffsetStopLimit, instead of the two-way comparison
that considers half the number-space to be in the past and half in the
future, in my earlier message.
--
Thomas Munro
http://www.enterprisedb.com
--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs
Thomas Munro wrote:
That's why I proposed not using xid-like logic, and instead using a
type of three-way comparison that allows you to see when nextOffset
would 'cross' oldestOffsetStopLimit, instead of the two-way comparison
that considers half the number-space to be in the past and half in the
future, in my earlier message.
Yeah, that bit made sense to me.
--
�lvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs
On Thu, Apr 23, 2015 at 9:59 PM, Alvaro Herrera
<alvherre@2ndquadrant.com> wrote:
Thomas Munro wrote:
That's why I proposed not using xid-like logic, and instead using a
type of three-way comparison that allows you to see when nextOffset
would 'cross' oldestOffsetStopLimit, instead of the two-way comparison
that considers half the number-space to be in the past and half in the
future, in my earlier message.Yeah, that bit made sense to me.
In addition to preventing the corruption, I think we also need a
back-patchable fix for AV to try to keep this situation from happening
in the first place. We do not want the system to get stuck in a
situation where member usage is low, so autovacuum does nothing, but
offset usage is high, so the user just keeps getting an error whenever
they do anything that could cause a new MXID to be created. Perhaps
if we'd thought of it we would have added yet another column to
pg_class to track the oldest offset referenced by any MXID in a
particular relation, but that sounds kind of icky on general principle
and certainly wouldn't be back-patchable.
What I think we should do is notice when members utilization exceeds
offset utilization and progressively ramp back the effective value of
autovacuum_multixact_freeze_max_age (and maybe also
vacuum_multixact_freeze_table_age and vacuum_multixact_freeze_min_age)
so that autovacuum (and maybe also manual vacuums) get progressively
more aggressive about trying to advance relminmxid. Suppose we decide
that when the "members" space is 75% used, we've got a big problem and
want to treat autovacuum_multixact_freeze_max_age to effectively be
zero. Conversely, suppose that when the members space is at a lower
usage percentage than the offsets space, or when it's anyway less than
autovacuum_multixact_freeze_max_age, we define that as completely
acceptable. If we're somewhere above the "no problem" threshold but
below the "big problem" threshold, then we calculate what percentage
of the way we are from the "no problem" threshold to the "big problem"
threshold and reduce autovacuum_multixact_freeze_max_age by that
percentage.
Example: autovacuum_multixact_freeze_max_age is 25% of 2^32 and we're
using 20% of the offsets space but 40% of the members space. We're
(40 - 20) / (75 - 20) = 36% of the way to the 75% threshold we never
want to exceed, so we reduce the effective value of
autovacuum_multixact_freeze_max_age by 36%. In this case, that means
treating the configured value of 1073741824 as if it were 1073741824 *
(35/55) = 683290251. Hopefully that's enough to trigger enough
vacuuming to cause some relminmxid advancement, but if not, and the
situation continues to worsen, we'll get more and more aggressive.
This may not be the right proposal in detail, but I think we should do
something. I don't like the idea telling users that they can no
longer count on autovacuum to prevent wraparound, and that if they
don't manually tune their vacuum settings correctly, their system may
just stop working at some point.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs
Robert Haas <robertmhaas@gmail.com> wrote:
On Tue, Apr 21, 2015 at 5:16 PM, Kevin Grittner <kgrittn@ymail.com> wrote:
Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
Here's a patch. I have tested locally and it closes the issue
for me. If those affected can confirm that it stops the file
removal from happening, I'd appreciate it.Based on initial testing, it seems to stop file removal from
happening rather too well. Before applying the patch I ran a test
test that generated files 0000 to 1185D in the members directory.
Even setting vacuum_multixact_freeze_min_age and
vacuum_multixact_freeze_table_age very low, none of the members
files would go away with VACUUM (with and without FREEZE) followed
by CHECKPOINT. After applying the patch and starting with a fresh
initdb, with very low settings of the vacuum_multixact_* GUCs I get
the new error almost immediately, while the only file in the
members subdirectory is 0000 and it is 8kB in size.I think the calculations might be wrong, but I'm not sure what does
make sense.
Can anyone else reproduce this?
I think I see why I was seeing this and nobody else was -- I was
testing the cleanup on an otherwise quiescent cluster. It seems
that no amount of VACUUM and CHECKPOINT will clean up the members
subdirectory in the absence of processes adding more members. But
after performing the VACUUMs and CHECKPOINT if I start the test
program to add more multi-transactions with lots of members, *then*
the members subdirectory gets cleaned up. That seems confusing and
a bit annoying, but is otherwise benign. I would put "allow VACUUM
followed by CHECKPOINT to delete unneeded files from the members
subdirectory" on the "nice to have" list, but would not want to
delay a fix for the corruption issues for it.
--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs
On Fri, Apr 24, 2015 at 5:34 PM, Kevin Grittner <kgrittn@ymail.com> wrote:
I think I see why I was seeing this and nobody else was
Thomas said he reproduced it. No?
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs