Testing autovacuum wraparound (including failsafe)
Hi,
I started to write a test for $Subject, which I think we sorely need.
Currently my approach is to:
- start a cluster, create a few tables with test data
- acquire SHARE UPDATE EXCLUSIVE in a prepared transaction, to prevent
autovacuum from doing anything
- cause dead tuples to exist
- restart
- run pg_resetwal -x 2000027648
- do things like acquiring pins on pages that block vacuum from progressing
- commit prepared transaction
- wait for template0, template1 datfrozenxid to increase
- wait for relfrozenxid for most relations in postgres to increase
- release buffer pin
- wait for postgres datfrozenxid to increase
So far so good. But I've encountered a few things that stand in the way of
enabling such a test by default:
1) During startup StartupSUBTRANS() zeroes out all pages between
oldestActiveXID and nextXid. That takes 8s on my workstation, but only
because I have plenty memory - pg_subtrans ends up 14GB as I currently do
the test. Clearly not something we could do on the BF.
2) FAILSAFE_MIN_PAGES is 4GB - which seems to make it infeasible to test the
failsafe mode, we can't really create 4GB relations on the BF. While
writing the tests I've lowered this to 4MB...
3) pg_resetwal -x requires to carefully choose an xid: It needs to be the
first xid on a clog page. It's not hard to determine which xids are but it
depends on BLCKSZ and a few constants in clog.c. I've for now hardcoded a
value appropriate for 8KB, but ...
I have 2 1/2 ideas about addressing 1);
- We could exposing functionality to do advance nextXid to a future value at
runtime, without filling in clog/subtrans pages. Would probably have to live
in varsup.c and be exposed via regress.so or such?
- The only reason StartupSUBTRANS() does that work is because of the prepared
transaction holding back oldestActiveXID. That transaction in turn exists to
prevent autovacuum from doing anything before we do test setup
steps.
Perhaps it'd be sufficient to set autovacuum_naptime really high initially,
perform the test setup, set naptime to something lower, reload config. But
I'm worried that might not be reliable: If something ends up allocating an
xid we'd potentially reach the path in GetNewTransaction() that wakes up the
launcher? But probably there wouldn't be anything doing so?
Another aspect that might not make this a good choice is that it actually
seems relevant to be able to test cases where there are very old still
running transactions...
- As a variant of the previous idea: If that turns out to be unreliable, we
could instead set nextxid, start in single user mode, create a blocking 2PC
transaction, start normally. Because there's no old active xid we'd not run
into the StartupSUBTRANS problem.
For 2), I don't really have a better idea than making that configurable
somehow?
3) is probably tolerable for now, we could skip the test if BLCKSZ isn't 8KB,
or we could hardcode the calculation for different block sizes.
I noticed one minor bug that's likely new:
2021-04-23 13:32:30.899 PDT [2027738] LOG: automatic aggressive vacuum to prevent wraparound of table "postgres.public.small_trunc": index scans: 1
pages: 400 removed, 28 remain, 0 skipped due to pins, 0 skipped frozen
tuples: 14000 removed, 1000 remain, 0 are dead but not yet removable, oldest xmin: 2000027651
buffer usage: 735 hits, 1262 misses, 874 dirtied
index scan needed: 401 pages from table (1432.14% of total) had 14000 dead item identifiers removed
index "small_trunc_pkey": pages: 43 in total, 37 newly deleted, 37 currently deleted, 0 reusable
avg read rate: 559.048 MB/s, avg write rate: 387.170 MB/s
system usage: CPU: user: 0.01 s, system: 0.00 s, elapsed: 0.01 s
WAL usage: 1809 records, 474 full page images, 3977538 bytes
'1432.14% of total' - looks like removed pages need to be added before the
percentage calculation?
Greetings,
Andres Freund
On Fri, Apr 23, 2021 at 01:43:06PM -0700, Andres Freund wrote:
2) FAILSAFE_MIN_PAGES is 4GB - which seems to make it infeasible to test the
failsafe mode, we can't really create 4GB relations on the BF. While
writing the tests I've lowered this to 4MB...
For 2), I don't really have a better idea than making that configurable
somehow?
Does it work to shut down the cluster and create the .0,.1,.2,.3 segments of a
new, empty relation with zero blocks using something like truncate -s 1G ?
--
Justin
On Fri, Apr 23, 2021 at 1:43 PM Andres Freund <andres@anarazel.de> wrote:
I started to write a test for $Subject, which I think we sorely need.
+1
Currently my approach is to:
- start a cluster, create a few tables with test data
- acquire SHARE UPDATE EXCLUSIVE in a prepared transaction, to prevent
autovacuum from doing anything
- cause dead tuples to exist
- restart
- run pg_resetwal -x 2000027648
- do things like acquiring pins on pages that block vacuum from progressing
- commit prepared transaction
- wait for template0, template1 datfrozenxid to increase
- wait for relfrozenxid for most relations in postgres to increase
- release buffer pin
- wait for postgres datfrozenxid to increase
Just having a standard-ish way to do stress testing like this would
add something.
2) FAILSAFE_MIN_PAGES is 4GB - which seems to make it infeasible to test the
failsafe mode, we can't really create 4GB relations on the BF. While
writing the tests I've lowered this to 4MB...
The only reason that I chose 4GB for FAILSAFE_MIN_PAGES is because the
related VACUUM_FSM_EVERY_PAGES constant was 8GB -- the latter limits
how often we'll consider the failsafe in the single-pass/no-indexes
case.
I see no reason why it cannot be changed now. VACUUM_FSM_EVERY_PAGES
also frustrates FSM testing in the single-pass case in about the same
way, so maybe that should be considered as well? Note that the FSM
handling for the single pass case is actually a bit different to the
two pass/has-indexes case, since the single pass case calls
lazy_vacuum_heap_page() directly in its first and only pass over the
heap (that's the whole point of having it of course).
3) pg_resetwal -x requires to carefully choose an xid: It needs to be the
first xid on a clog page. It's not hard to determine which xids are but it
depends on BLCKSZ and a few constants in clog.c. I've for now hardcoded a
value appropriate for 8KB, but ...
Ugh.
For 2), I don't really have a better idea than making that configurable
somehow?
That could make sense as a developer/testing option, I suppose. I just
doubt that it makes sense as anything else.
2021-04-23 13:32:30.899 PDT [2027738] LOG: automatic aggressive vacuum to prevent wraparound of table "postgres.public.small_trunc": index scans: 1
pages: 400 removed, 28 remain, 0 skipped due to pins, 0 skipped frozen
tuples: 14000 removed, 1000 remain, 0 are dead but not yet removable, oldest xmin: 2000027651
buffer usage: 735 hits, 1262 misses, 874 dirtied
index scan needed: 401 pages from table (1432.14% of total) had 14000 dead item identifiers removed
index "small_trunc_pkey": pages: 43 in total, 37 newly deleted, 37 currently deleted, 0 reusable
avg read rate: 559.048 MB/s, avg write rate: 387.170 MB/s
system usage: CPU: user: 0.01 s, system: 0.00 s, elapsed: 0.01 s
WAL usage: 1809 records, 474 full page images, 3977538 bytes'1432.14% of total' - looks like removed pages need to be added before the
percentage calculation?
Clearly this needs to account for removed heap pages in order to
consistently express the percentage of pages with LP_DEAD items in
terms of a percentage of the original table size. I can fix this
shortly.
--
Peter Geoghegan
Hi,
On 2021-04-23 18:08:12 -0500, Justin Pryzby wrote:
On Fri, Apr 23, 2021 at 01:43:06PM -0700, Andres Freund wrote:
2) FAILSAFE_MIN_PAGES is 4GB - which seems to make it infeasible to test the
failsafe mode, we can't really create 4GB relations on the BF. While
writing the tests I've lowered this to 4MB...For 2), I don't really have a better idea than making that configurable
somehow?Does it work to shut down the cluster and create the .0,.1,.2,.3 segments of a
new, empty relation with zero blocks using something like truncate -s 1G ?
I'd like this to be portable to at least windows - I don't know how well
that deals with sparse files. But the bigger issue is that that IIRC
will trigger vacuum to try to initialize all those pages, which will
then force all that space to be allocated anyway...
Greetings,
Andres Freund
Hi,
On 2021-04-23 16:12:33 -0700, Peter Geoghegan wrote:
The only reason that I chose 4GB for FAILSAFE_MIN_PAGES is because the
related VACUUM_FSM_EVERY_PAGES constant was 8GB -- the latter limits
how often we'll consider the failsafe in the single-pass/no-indexes
case.
I don't really understand why it makes sense to tie FAILSAFE_MIN_PAGES
and VACUUM_FSM_EVERY_PAGES together? They seem pretty independent to me?
I see no reason why it cannot be changed now. VACUUM_FSM_EVERY_PAGES
also frustrates FSM testing in the single-pass case in about the same
way, so maybe that should be considered as well? Note that the FSM
handling for the single pass case is actually a bit different to the
two pass/has-indexes case, since the single pass case calls
lazy_vacuum_heap_page() directly in its first and only pass over the
heap (that's the whole point of having it of course).
I'm not opposed to lowering VACUUM_FSM_EVERY_PAGES (the costs don't seem
all that high compared to vacuuming?), but I don't think there's as
clear a need for testing around that as there is around wraparound.
The failsafe mode affects the table scan itself by disabling cost
limiting. As far as I can see the ways it triggers for the table scan (vs
truncation or index processing) are:
1) Before vacuuming starts, for heap phases and indexes, if already
necessary at that point
2) For a table with indexes, before/after each index vacuum, if now
necessary
3) On a table without indexes, every 8GB, iff there are dead tuples, if now necessary
Why would we want to trigger the failsafe mode during a scan of a table
with dead tuples and no indexes, but not on a table without dead tuples
or with indexes but fewer than m_w_m dead tuples? That makes little
sense to me.
It seems that for the no-index case the warning message is quite off?
ereport(WARNING,
(errmsg("abandoned index vacuuming of table \"%s.%s.%s\" as a failsafe after %d index scans",
Doesn't exactly make one understand that vacuum cost limiting now is
disabled? And is confusing because there would never be index vacuuming?
And even in the cases indexes exist, it's odd to talk about abandoning
index vacuuming that hasn't even started yet?
For 2), I don't really have a better idea than making that configurable
somehow?That could make sense as a developer/testing option, I suppose. I just
doubt that it makes sense as anything else.
Yea, I only was thinking of making it configurable to be able to test
it. If we change the limit to something considerably lower I wouldn't
see a need for that anymore.
Greetings,
Andres Freund
On Fri, Apr 23, 2021 at 5:29 PM Andres Freund <andres@anarazel.de> wrote:
On 2021-04-23 16:12:33 -0700, Peter Geoghegan wrote:
The only reason that I chose 4GB for FAILSAFE_MIN_PAGES is because the
related VACUUM_FSM_EVERY_PAGES constant was 8GB -- the latter limits
how often we'll consider the failsafe in the single-pass/no-indexes
case.I don't really understand why it makes sense to tie FAILSAFE_MIN_PAGES
and VACUUM_FSM_EVERY_PAGES together? They seem pretty independent to me?
VACUUM_FSM_EVERY_PAGES controls how often VACUUM does work that
usually takes place right after the two pass case finishes a round of
index and heap vacuuming. This is work that we certainly don't want to
do every time we process a single heap page in the one-pass/no-indexes
case. Initially this just meant FSM vacuuming, but it now includes a
failsafe check.
Of course all of the precise details here are fairly arbitrary
(including VACUUM_FSM_EVERY_PAGES, which has been around for a couple
of releases now). The overall goal that I had in mind was to make the
one-pass case's use of the failsafe have analogous behavior to the
two-pass/has-indexes case -- a goal which was itself somewhat
arbitrary.
The failsafe mode affects the table scan itself by disabling cost
limiting. As far as I can see the ways it triggers for the table scan (vs
truncation or index processing) are:1) Before vacuuming starts, for heap phases and indexes, if already
necessary at that point
2) For a table with indexes, before/after each index vacuum, if now
necessary
3) On a table without indexes, every 8GB, iff there are dead tuples, if now necessaryWhy would we want to trigger the failsafe mode during a scan of a table
with dead tuples and no indexes, but not on a table without dead tuples
or with indexes but fewer than m_w_m dead tuples? That makes little
sense to me.
What alternative does make sense to you?
It seemed important to put the failsafe check at points where we do
other analogous work in all cases. We made a pragmatic trade-off. In
theory almost any scheme might not check often enough, and/or might
check too frequently.
It seems that for the no-index case the warning message is quite off?
I'll fix that up some point soon. FWIW this happened because the
support for one-pass VACUUM was added quite late, at Robert's request.
Another issue with the failsafe commit is that we haven't considered
the autovacuum_multixact_freeze_max_age table reloption -- we only
check the GUC. That might have accidentally been the right thing to
do, though, since the reloption is interpreted as lower than the GUC
in all cases anyway -- arguably the
autovacuum_multixact_freeze_max_age GUC should be all we care about
anyway. I will need to think about this question some more, though.
For 2), I don't really have a better idea than making that configurable
somehow?That could make sense as a developer/testing option, I suppose. I just
doubt that it makes sense as anything else.Yea, I only was thinking of making it configurable to be able to test
it. If we change the limit to something considerably lower I wouldn't
see a need for that anymore.
It would probably be okay to just lower it significantly. Not sure if
that's the best approach, though. Will pick it up next week.
--
Peter Geoghegan
Hi,
On 2021-04-23 19:15:43 -0700, Peter Geoghegan wrote:
The failsafe mode affects the table scan itself by disabling cost
limiting. As far as I can see the ways it triggers for the table scan (vs
truncation or index processing) are:1) Before vacuuming starts, for heap phases and indexes, if already
necessary at that point
2) For a table with indexes, before/after each index vacuum, if now
necessary
3) On a table without indexes, every 8GB, iff there are dead tuples, if now necessaryWhy would we want to trigger the failsafe mode during a scan of a table
with dead tuples and no indexes, but not on a table without dead tuples
or with indexes but fewer than m_w_m dead tuples? That makes little
sense to me.What alternative does make sense to you?
Check it every so often, independent of whether there are indexes or
dead tuples? Or just check it at the boundaries.
I'd make it dependent on the number of pages scanned, rather than the
block distance to the last check - otherwise we might end up doing it
way too often when there's only a few individual pages not in the freeze
map.
Greetings,
Andres Freund
On Fri, Apr 23, 2021 at 7:33 PM Andres Freund <andres@anarazel.de> wrote:
Check it every so often, independent of whether there are indexes or
dead tuples? Or just check it at the boundaries.
I think that the former suggestion might be better -- I actually
thought about doing it that way myself.
The latter suggestion sounds like you're suggesting that we just check
it at the beginning and the end in all cases (we do the beginning in
all cases already, but now we'd also do the end outside of the loop in
all cases). Is that right? If that is what you meant, then you should
note that there'd hardly be any check in the one-pass case with that
scheme (apart from the initial check that we do already). The only
work we'd be skipping at the end (in the event of that check
triggering the failsafe) would be heap truncation, which (as you've
pointed out yourself) doesn't seem particularly likely to matter.
--
Peter Geoghegan
Hi,
On 2021-04-23 19:42:30 -0700, Peter Geoghegan wrote:
On Fri, Apr 23, 2021 at 7:33 PM Andres Freund <andres@anarazel.de> wrote:
Check it every so often, independent of whether there are indexes or
dead tuples? Or just check it at the boundaries.I think that the former suggestion might be better -- I actually
thought about doing it that way myself.
Cool.
The latter suggestion sounds like you're suggesting that we just check
it at the beginning and the end in all cases (we do the beginning in
all cases already, but now we'd also do the end outside of the loop in
all cases). Is that right?
Yes.
If that is what you meant, then you should note that there'd hardly be
any check in the one-pass case with that scheme (apart from the
initial check that we do already). The only work we'd be skipping at
the end (in the event of that check triggering the failsafe) would be
heap truncation, which (as you've pointed out yourself) doesn't seem
particularly likely to matter.
I mainly suggested it because to me the current seems hard to
understand. I do think it'd be better to check more often. But checking
depending on the amount of dead tuples at the right time doesn't strike
me as a good idea - a lot of anti-wraparound vacuums will mainly be
freezing tuples, rather than removing a lot of dead rows. Which makes it
hard to understand when the failsafe kicks in.
Greetings,
Andres Freund
On Fri, Apr 23, 2021 at 7:53 PM Andres Freund <andres@anarazel.de> wrote:
I mainly suggested it because to me the current seems hard to
understand. I do think it'd be better to check more often. But checking
depending on the amount of dead tuples at the right time doesn't strike
me as a good idea - a lot of anti-wraparound vacuums will mainly be
freezing tuples, rather than removing a lot of dead rows. Which makes it
hard to understand when the failsafe kicks in.
I'm convinced -- decoupling the logic from the one-pass-not-two pass
case seems likely to be simpler and more useful. For both the one pass
and two pass/has indexes case.
--
Peter Geoghegan
On Fri, Apr 23, 2021 at 7:56 PM Peter Geoghegan <pg@bowt.ie> wrote:
I'm convinced -- decoupling the logic from the one-pass-not-two pass
case seems likely to be simpler and more useful. For both the one pass
and two pass/has indexes case.
Attached draft patch does it that way.
--
Peter Geoghegan
Attachments:
v1-0001-Consider-triggering-failsafe-during-first-scan.patchapplication/octet-stream; name=v1-0001-Consider-triggering-failsafe-during-first-scan.patchDownload+15-20
On Sat, Apr 24, 2021 at 11:16 AM Peter Geoghegan <pg@bowt.ie> wrote:
On Fri, Apr 23, 2021 at 5:29 PM Andres Freund <andres@anarazel.de> wrote:
On 2021-04-23 16:12:33 -0700, Peter Geoghegan wrote:
The only reason that I chose 4GB for FAILSAFE_MIN_PAGES is because the
related VACUUM_FSM_EVERY_PAGES constant was 8GB -- the latter limits
how often we'll consider the failsafe in the single-pass/no-indexes
case.I don't really understand why it makes sense to tie FAILSAFE_MIN_PAGES
and VACUUM_FSM_EVERY_PAGES together? They seem pretty independent to me?VACUUM_FSM_EVERY_PAGES controls how often VACUUM does work that
usually takes place right after the two pass case finishes a round of
index and heap vacuuming. This is work that we certainly don't want to
do every time we process a single heap page in the one-pass/no-indexes
case. Initially this just meant FSM vacuuming, but it now includes a
failsafe check.Of course all of the precise details here are fairly arbitrary
(including VACUUM_FSM_EVERY_PAGES, which has been around for a couple
of releases now). The overall goal that I had in mind was to make the
one-pass case's use of the failsafe have analogous behavior to the
two-pass/has-indexes case -- a goal which was itself somewhat
arbitrary.The failsafe mode affects the table scan itself by disabling cost
limiting. As far as I can see the ways it triggers for the table scan (vs
truncation or index processing) are:1) Before vacuuming starts, for heap phases and indexes, if already
necessary at that point
2) For a table with indexes, before/after each index vacuum, if now
necessary
3) On a table without indexes, every 8GB, iff there are dead tuples, if now necessaryWhy would we want to trigger the failsafe mode during a scan of a table
with dead tuples and no indexes, but not on a table without dead tuples
or with indexes but fewer than m_w_m dead tuples? That makes little
sense to me.What alternative does make sense to you?
It seemed important to put the failsafe check at points where we do
other analogous work in all cases. We made a pragmatic trade-off. In
theory almost any scheme might not check often enough, and/or might
check too frequently.It seems that for the no-index case the warning message is quite off?
I'll fix that up some point soon. FWIW this happened because the
support for one-pass VACUUM was added quite late, at Robert's request.
+1 to fix this. Are you already working on fixing this? If not, I'll
post a patch.
Another issue with the failsafe commit is that we haven't considered
the autovacuum_multixact_freeze_max_age table reloption -- we only
check the GUC. That might have accidentally been the right thing to
do, though, since the reloption is interpreted as lower than the GUC
in all cases anyway -- arguably the
autovacuum_multixact_freeze_max_age GUC should be all we care about
anyway. I will need to think about this question some more, though.
FWIW, I intentionally ignored the reloption there since they're
interpreted as lower than the GUC as you mentioned and the situation
where we need to enter the failsafe mode is not the table-specific
problem but a system-wide problem.
Regards,
--
Masahiko Sawada
EDB: https://www.enterprisedb.com/
On Mon, May 17, 2021 at 10:29 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
+1 to fix this. Are you already working on fixing this? If not, I'll
post a patch.
I posted a patch recently (last Thursday my time). Perhaps you can review it?
--
Peter Geoghegan
On Tue, May 18, 2021 at 2:42 PM Peter Geoghegan <pg@bowt.ie> wrote:
On Mon, May 17, 2021 at 10:29 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
+1 to fix this. Are you already working on fixing this? If not, I'll
post a patch.I posted a patch recently (last Thursday my time). Perhaps you can review it?
Oh, I missed that the patch includes that fix. I'll review the patch.
Regards,
--
Masahiko Sawada
EDB: https://www.enterprisedb.com/
On Tue, May 18, 2021 at 2:46 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
On Tue, May 18, 2021 at 2:42 PM Peter Geoghegan <pg@bowt.ie> wrote:
On Mon, May 17, 2021 at 10:29 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
+1 to fix this. Are you already working on fixing this? If not, I'll
post a patch.I posted a patch recently (last Thursday my time). Perhaps you can review it?
Oh, I missed that the patch includes that fix. I'll review the patch.
I've reviewed the patch. Here is one comment:
if (vacrel->num_index_scans == 0 &&
- vacrel->rel_pages <= FAILSAFE_MIN_PAGES)
+ vacrel->rel_pages <= FAILSAFE_EVERY_PAGES)
return false;
Since there is the condition "vacrel->num_index_scans == 0" we could
enter the failsafe mode even if the table is less than 4GB, if we
enter lazy_check_wraparound_failsafe() after executing more than one
index scan. Whereas a vacuum on the table that is less than 4GB and
has no index never enters the failsafe mode. I think we can remove
this condition since I don't see the reason why we don't allow to
enter the failsafe mode only when the first-time index scan in the
case of such tables. What do you think?
Regards,
--
Masahiko Sawada
EDB: https://www.enterprisedb.com/
On Tue, May 18, 2021 at 12:10 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
Since there is the condition "vacrel->num_index_scans == 0" we could
enter the failsafe mode even if the table is less than 4GB, if we
enter lazy_check_wraparound_failsafe() after executing more than one
index scan. Whereas a vacuum on the table that is less than 4GB and
has no index never enters the failsafe mode. I think we can remove
this condition since I don't see the reason why we don't allow to
enter the failsafe mode only when the first-time index scan in the
case of such tables. What do you think?
I'm convinced -- this does seem like premature optimization now.
I pushed a version of the patch that removes that code just now.
Thanks
--
Peter Geoghegan
On Thu, Jun 10, 2021 at 10:52 AM Andres Freund <andres@anarazel.de> wrote:
I started to write a test for $Subject, which I think we sorely need.
Currently my approach is to:
- start a cluster, create a few tables with test data
- acquire SHARE UPDATE EXCLUSIVE in a prepared transaction, to prevent
autovacuum from doing anything
- cause dead tuples to exist
- restart
- run pg_resetwal -x 2000027648
- do things like acquiring pins on pages that block vacuum from progressing
- commit prepared transaction
- wait for template0, template1 datfrozenxid to increase
- wait for relfrozenxid for most relations in postgres to increase
- release buffer pin
- wait for postgres datfrozenxid to increase
Cool. Thank you for working on that!
Could you please share a WIP patch for the $subj? I'd be happy to help with
it.
So far so good. But I've encountered a few things that stand in the way of
enabling such a test by default:
1) During startup StartupSUBTRANS() zeroes out all pages between
oldestActiveXID and nextXid. That takes 8s on my workstation, but only
because I have plenty memory - pg_subtrans ends up 14GB as I currently
do
the test. Clearly not something we could do on the BF.
....
3) pg_resetwal -x requires to carefully choose an xid: It needs to be the
first xid on a clog page. It's not hard to determine which xids are but
it
depends on BLCKSZ and a few constants in clog.c. I've for now hardcoded
a
value appropriate for 8KB, but ...Maybe we can add new pg_resetwal option? Something like pg_resetwal
--xid-near-wraparound, which will ask pg_resetwal to calculate exact xid
value using values from pg_control and clog macros?
I think it might come in handy for manual testing too.
I have 2 1/2 ideas about addressing 1);
- We could exposing functionality to do advance nextXid to a future value
at
runtime, without filling in clog/subtrans pages. Would probably have to
live
in varsup.c and be exposed via regress.so or such?This option looks scary to me. Several functions rely on the fact that
StartupSUBTRANS() have zeroed pages.
And if we will do it conditional just for tests, it means that we won't
test the real code path.
- The only reason StartupSUBTRANS() does that work is because of the
prepared
transaction holding back oldestActiveXID. That transaction in turn
exists to
prevent autovacuum from doing anything before we do test setup
steps.
Perhaps it'd be sufficient to set autovacuum_naptime really high
initially,
perform the test setup, set naptime to something lower, reload config.
But
I'm worried that might not be reliable: If something ends up allocating
an
xid we'd potentially reach the path in GetNewTransaction() that wakes up
the
launcher? But probably there wouldn't be anything doing so?
Another aspect that might not make this a good choice is that it actually
seems relevant to be able to test cases where there are very old still
running transactions...Maybe this exact scenario can be covered with a separate long-running
test, not included in buildfarm test suite?
--
Best regards,
Lubennikova Anastasia
Hi,
On 2021-06-10 16:42:01 +0300, Anastasia Lubennikova wrote:
Cool. Thank you for working on that!
Could you please share a WIP patch for the $subj? I'd be happy to help with
it.
I've attached the current WIP state, which hasn't evolved much since
this message... I put the test in src/backend/access/heap/t/001_emergency_vacuum.pl
but I'm not sure that's the best place. But I didn't think
src/test/recovery is great either.
Regards,
Andres
Attachments:
001_emergency_vacuum.pltext/x-perl; charset=us-asciiDownload
On Fri, Jun 11, 2021 at 10:19 AM Andres Freund <andres@anarazel.de> wrote:
Hi,
On 2021-06-10 16:42:01 +0300, Anastasia Lubennikova wrote:
Cool. Thank you for working on that!
Could you please share a WIP patch for the $subj? I'd be happy to help with
it.I've attached the current WIP state, which hasn't evolved much since
this message... I put the test in src/backend/access/heap/t/001_emergency_vacuum.pl
but I'm not sure that's the best place. But I didn't think
src/test/recovery is great either.
Thank you for sharing the WIP patch.
Regarding point (1) you mentioned (StartupSUBTRANS() takes a long time
for zeroing out all pages), how about using single-user mode instead
of preparing the transaction? That is, after pg_resetwal we check the
ages of datfrozenxid by executing a query in single-user mode. That
way, we don’t need to worry about autovacuum concurrently running
while checking the ages of frozenxids. I’ve attached a PoC patch that
does the scenario like:
1. start cluster with autovacuum=off and create tables with a few data
and make garbage on them
2. stop cluster and do pg_resetwal
3. start cluster in single-user mode
4. check age(datfrozenxid)
5. stop cluster
6. start cluster and wait for autovacuums to increase template0,
template1, and postgres datfrozenxids
I put new tests in src/test/module/heap since we already have tests
for brin in src/test/module/brin.
I think that tap test facility to run queries in single-user mode will
also be helpful for testing a new vacuum option/command that is
intended to use in emergency cases and proposed here[1]/messages/by-id/20220128012842.GZ23027@telsasoft.com.
Regards,
[1]: /messages/by-id/20220128012842.GZ23027@telsasoft.com
--
Masahiko Sawada
EDB: https://www.enterprisedb.com/
Attachments:
tap_tests_for_wraparound_emregency.patchapplication/x-patch; name=tap_tests_for_wraparound_emregency.patchDownload+183-1
Hi,
On Tue, Feb 1, 2022 at 11:58 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
On Fri, Jun 11, 2021 at 10:19 AM Andres Freund <andres@anarazel.de> wrote:
Hi,
On 2021-06-10 16:42:01 +0300, Anastasia Lubennikova wrote:
Cool. Thank you for working on that!
Could you please share a WIP patch for the $subj? I'd be happy to help with
it.I've attached the current WIP state, which hasn't evolved much since
this message... I put the test in src/backend/access/heap/t/001_emergency_vacuum.pl
but I'm not sure that's the best place. But I didn't think
src/test/recovery is great either.Thank you for sharing the WIP patch.
Regarding point (1) you mentioned (StartupSUBTRANS() takes a long time
for zeroing out all pages), how about using single-user mode instead
of preparing the transaction? That is, after pg_resetwal we check the
ages of datfrozenxid by executing a query in single-user mode. That
way, we don’t need to worry about autovacuum concurrently running
while checking the ages of frozenxids. I’ve attached a PoC patch that
does the scenario like:1. start cluster with autovacuum=off and create tables with a few data
and make garbage on them
2. stop cluster and do pg_resetwal
3. start cluster in single-user mode
4. check age(datfrozenxid)
5. stop cluster
6. start cluster and wait for autovacuums to increase template0,
template1, and postgres datfrozenxids
The above steps are wrong.
I think we can expose a function in an extension used only by this
test in order to set nextXid to a future value with zeroing out
clog/subtrans pages. We don't need to fill all clog/subtrans pages
between oldestActiveXID and nextXid. I've attached a PoC patch for
adding this regression test and am going to register it to the next
CF.
BTW, while testing the emergency situation, I found there is a race
condition where anti-wraparound vacuum isn't invoked with the settings
autovacuum = off, autovacuum_max_workers = 1. AN autovacuum worker
sends a signal to the postmaster after advancing datfrozenxid in
SetTransactionIdLimit(). But with the settings, if the autovacuum
launcher attempts to launch a worker before the autovacuum worker who
has signaled to the postmaster finishes, the launcher exits without
launching a worker due to no free workers. The new launcher won’t be
launched until new XID is generated (and only when new XID % 65536 ==
0). Although autovacuum_max_workers = 1 is not mandatory for this
test, it's easier to verify the order of operations.
Regards,
--
Masahiko Sawada
EDB: https://www.enterprisedb.com/