[PATCH] Disable bgworkers during servers start in pg_upgrade
Hello,
We found an issue in pg_upgrade on a cluster with a third-party
background worker. The upgrade goes fine, but the new cluster is then in
an inconsistent state. The background worker comes from the PoWA
extension but the issue does not appear to related to this particular
code.
Here is a shell script to reproduce the issue (error at the end):
OLDBINDIR=/usr/lib/postgresql/11/bin
NEWBINDIR=/usr/lib/postgresql/13/bin
OLDDATADIR=$(mktemp -d)
NEWDATADIR=$(mktemp -d)
$OLDBINDIR/initdb -D $OLDDATADIR
echo "unix_socket_directories = '/tmp'" >> $OLDDATADIR/postgresql.auto.conf
echo "shared_preload_libraries = 'pg_stat_statements, powa'" >> $OLDDATADIR/postgresql.auto.conf
$OLDBINDIR/pg_ctl -D $OLDDATADIR -l $OLDDATADIR/pgsql.log start
$OLDBINDIR/createdb -h /tmp powa
$OLDBINDIR/psql -h /tmp -d powa -c "CREATE EXTENSION powa CASCADE"
$OLDBINDIR/pg_ctl -D $OLDDATADIR -m fast stop
$NEWBINDIR/initdb -D $NEWDATADIR
cp $OLDDATADIR/postgresql.auto.conf $NEWDATADIR/postgresql.auto.conf
$NEWBINDIR/pg_upgrade --old-datadir $OLDDATADIR --new-datadir $NEWDATADIR --old-bindir $OLDBINDIR
$NEWBINDIR/pg_ctl -D $NEWDATADIR -l $NEWDATADIR/pgsql.log start
$NEWBINDIR/psql -h /tmp -d powa -c "SELECT 1 FROM powa_snapshot_metas"
# ERROR: MultiXactId 1 has not been created yet -- apparent wraparound
(This needs PoWA to be installed; packages are available on pgdg
repositories as postgresql-<pgversion>-powa on Debian or
powa_<pgversion> on RedHat or directly from source code at
https://github.com/powa-team/powa-archivist).
As far as I currently understand, this is due to the data to be migrated
being somewhat inconsistent (from the perspective of pg_upgrade) when
the old cluster and its background workers get started in pg_upgrade
during the "checks" step. (The old cluster remains sane, still.)
As a solution, it seems that, for similar reasons that we restrict
socket access to prevent accidental connections (from commit
f763b77193), we should also prevent background workers to start at this
step.
Please find attached a patch implementing this.
Thanks for considering,
Denis
Attachments:
0001-Disable-background-workers-during-servers-start-in-p.patchtext/x-diff; charset=us-asciiDownload+3-2
Hi,
On 2021-01-21 16:23:58 +0100, Denis Laxalde wrote:
We found an issue in pg_upgrade on a cluster with a third-party
background worker. The upgrade goes fine, but the new cluster is then in
an inconsistent state. The background worker comes from the PoWA
extension but the issue does not appear to related to this particular
code.
Well, it does imply that that backgrounder did something, as the pure
existence of a bgworker shouldn't affect
anything. Presumably the issue is that the bgworker actually does
transactional writes, which causes problems because the xids /
multixacts from early during pg_upgrade won't actually be valid after we
do pg_resetxlog etc.
As a solution, it seems that, for similar reasons that we restrict
socket access to prevent accidental connections (from commit
f763b77193), we should also prevent background workers to start at this
step.
I think that'd have quite the potential for negative impact - imagine
extensions that refuse to be loaded outside of shared_preload_libraries
(e.g. because they need to allocate shared memory) but that are required
during the course of pg_upgrade (e.g. because it's a tableam, a PL or
such). Those libraries will then tried to be loaded during the upgrade
(due to the _PG_init() hook being called when functions from the
extension are needed, e.g. the tableam or PL handler).
Nor is it clear to me that the only way this would be problematic is
with shared_preload_libraries. A library in local_preload_libraries, or
a demand loaded library can trigger bgworkers (or database writes in
some other form) as well.
I wonder if we could
a) set default_transaction_read_only to true, and explicitly change it
in the sessions that need that.
b) when in binary upgrade mode / -b, error out on all wal writes in
sessions that don't explicitly set a session-level GUC to allow
writes.
Greetings,
Andres Freund
Hi,
Andres Freund a �crit :
On 2021-01-21 16:23:58 +0100, Denis Laxalde wrote:
We found an issue in pg_upgrade on a cluster with a third-party
background worker. The upgrade goes fine, but the new cluster is then in
an inconsistent state. The background worker comes from the PoWA
extension but the issue does not appear to related to this particular
code.Well, it does imply that that backgrounder did something, as the pure
existence of a bgworker shouldn't affectanything. Presumably the issue is that the bgworker actually does
transactional writes, which causes problems because the xids /
multixacts from early during pg_upgrade won't actually be valid after we
do pg_resetxlog etc.As a solution, it seems that, for similar reasons that we restrict
socket access to prevent accidental connections (from commit
f763b77193), we should also prevent background workers to start at this
step.I think that'd have quite the potential for negative impact - imagine
extensions that refuse to be loaded outside of shared_preload_libraries
(e.g. because they need to allocate shared memory) but that are required
during the course of pg_upgrade (e.g. because it's a tableam, a PL or
such). Those libraries will then tried to be loaded during the upgrade
(due to the _PG_init() hook being called when functions from the
extension are needed, e.g. the tableam or PL handler).Nor is it clear to me that the only way this would be problematic is
with shared_preload_libraries. A library in local_preload_libraries, or
a demand loaded library can trigger bgworkers (or database writes in
some other form) as well.
Thank you for those insights!
I wonder if we could
a) set default_transaction_read_only to true, and explicitly change it
in the sessions that need that.
b) when in binary upgrade mode / -b, error out on all wal writes in
sessions that don't explicitly set a session-level GUC to allow
writes.
Solution "a" appears to be enough to solve the problem described in my
initial email. See attached patch implementing this.
Cheers,
Denis
Attachments:
0001-Set-default-transactions-to-read-only-at-servers-sta.patchtext/x-diff; charset=us-asciiDownload+1-2
Hi,
On Wed, 27 Jan 2021 11:25:11 +0100
Denis Laxalde <denis.laxalde@dalibo.com> wrote:
Andres Freund a écrit :
On 2021-01-21 16:23:58 +0100, Denis Laxalde wrote:
We found an issue in pg_upgrade on a cluster with a third-party
background worker. The upgrade goes fine, but the new cluster is then in
an inconsistent state. The background worker comes from the PoWA
extension but the issue does not appear to related to this particular
code.Well, it does imply that that backgrounder did something, as the pure
existence of a bgworker shouldn't affect anything. Presumably the issue is
that the bgworker actually does transactional writes, which causes problems
because the xids / multixacts from early during pg_upgrade won't actually
be valid after we do pg_resetxlog etc.
Indeed, it does some writes. As soon as the powa bgworker starts, it takes
"snapshots" of pg_stat_statements state and record them in a local table. To
avoid concurrent run, it takes a lock on some of its local rows using SELECT FOR
UPDATE, hence the mxid consumption.
The inconsistency occurs at least at two place:
* the datminmxid and relminmxid fields pg_dump(all)'ed and restored in the new
cluster.
* the multixid fields in the controlfile read during the check phase and
restored later using pg_resetxlog.
As a solution, it seems that, for similar reasons that we restrict
socket access to prevent accidental connections (from commit
f763b77193), we should also prevent background workers to start at this
step.I think that'd have quite the potential for negative impact - [...]
Thank you for those insights!
+1
I wonder if we could
a) set default_transaction_read_only to true, and explicitly change it
in the sessions that need that.
According to Denis' tests discussed off-list, it works fine in regard with the
powa bgworker, albeit some complaints in logs. However, I wonder how fragile it
could be as bgworker could easily manipulate either the GUC or "BEGIN READ
WRITE". I realize this is really uncommon practices, but bgworker code from
third parties might be surprising.
b) when in binary upgrade mode / -b, error out on all wal writes in
sessions that don't explicitly set a session-level GUC to allow
writes.
It feels safer because more specific to the subject. But I wonder if the
benefice worth adding some (limited?) complexity and a small/quick conditional
block in a very hot code path for a very limited use case.
What about c) where the bgworker are not loaded by default during binary upgrade
mode / -b unless they explicitly set a bgw_flags (BGWORKER_BINARY_UPGRADE ?)
when they are required during pg_upgrade?
Regards,
Oh, I forgot another point before sending my previous email.
Maybe it might worth adding some final safety checks in pg_upgrade itself?
Eg. checking controldata and mxid files coherency between old and new
cluster would have catch the inconsistency here.
On Wed, Jan 27, 2021 at 02:41:32PM +0100, Jehan-Guillaume de Rorthais wrote:
On Wed, 27 Jan 2021 11:25:11 +0100
Denis Laxalde <denis.laxalde@dalibo.com> wrote:Andres Freund a �crit :
I wonder if we could
a) set default_transaction_read_only to true, and explicitly change it
in the sessions that need that.According to Denis' tests discussed off-list, it works fine in regard with the
powa bgworker, albeit some complaints in logs. However, I wonder how fragile it
could be as bgworker could easily manipulate either the GUC or "BEGIN READ
WRITE". I realize this is really uncommon practices, but bgworker code from
third parties might be surprising.
Given that having any writes happening at the wrong moment on the old cluster
can end up corrupting the new cluster, and that the corruption might not be
immediately visible we should try to put as many safeguards as possible.
so +1 for the default_transaction_read_only as done in Denis' patch at minimum,
but not only.
AFAICT it should be easy to prevent all background worker from being launched
by adding a check on IsBinaryUpgrade at the beginning of
bgworker_should_start_now(). It won't prevent modules from being loaded, so
this approach should be problematic.
b) when in binary upgrade mode / -b, error out on all wal writes in
sessions that don't explicitly set a session-level GUC to allow
writes.It feels safer because more specific to the subject. But I wonder if the
benefice worth adding some (limited?) complexity and a small/quick conditional
block in a very hot code path for a very limited use case.
I don't think that it would add that much complexity or overhead as there's
already all the infrastructure to prevent WAL writes in certain condition. It
should be enough to add an additional test in XLogInsertAllowed() with some new
variable that is set when starting in binary upgrade mode, and a new function
to disable it that will be emitted by pg_dump / pg_dumpall in binary upgrade
mode.
What about c) where the bgworker are not loaded by default during binary upgrade
mode / -b unless they explicitly set a bgw_flags (BGWORKER_BINARY_UPGRADE ?)
when they are required during pg_upgrade?
As mentioned above +1 for not launching the bgworkers. Does anyone can think
of a reason why some bgworker would really be needed during pg_upgrade, either
on the source or the target cluster?
Julien Rouhaud a écrit :
On Wed, Jan 27, 2021 at 02:41:32PM +0100, Jehan-Guillaume de Rorthais wrote:
On Wed, 27 Jan 2021 11:25:11 +0100
Denis Laxalde <denis.laxalde@dalibo.com> wrote:Andres Freund a écrit :
I wonder if we could
a) set default_transaction_read_only to true, and explicitly change it
in the sessions that need that.According to Denis' tests discussed off-list, it works fine in regard with the
powa bgworker, albeit some complaints in logs. However, I wonder how fragile it
could be as bgworker could easily manipulate either the GUC or "BEGIN READ
WRITE". I realize this is really uncommon practices, but bgworker code from
third parties might be surprising.Given that having any writes happening at the wrong moment on the old cluster
can end up corrupting the new cluster, and that the corruption might not be
immediately visible we should try to put as many safeguards as possible.so +1 for the default_transaction_read_only as done in Denis' patch at minimum,
but not only.AFAICT it should be easy to prevent all background worker from being launched
by adding a check on IsBinaryUpgrade at the beginning of
bgworker_should_start_now(). It won't prevent modules from being loaded, so
this approach should be problematic.
Please find attached another patch implementing this suggestion (as a
complement to the previous patch setting default_transaction_read_only).
b) when in binary upgrade mode / -b, error out on all wal writes in
sessions that don't explicitly set a session-level GUC to allow
writes.It feels safer because more specific to the subject. But I wonder if the
benefice worth adding some (limited?) complexity and a small/quick conditional
block in a very hot code path for a very limited use case.I don't think that it would add that much complexity or overhead as there's
already all the infrastructure to prevent WAL writes in certain condition. It
should be enough to add an additional test in XLogInsertAllowed() with some new
variable that is set when starting in binary upgrade mode, and a new function
to disable it that will be emitted by pg_dump / pg_dumpall in binary upgrade
mode.
This part is less clear to me so I'm not sure I'd be able to work on it.
Attachments:
0001-Do-not-start-bgworker-processes-during-binary-upgrad.patchtext/x-patch; charset=UTF-8; name=0001-Do-not-start-bgworker-processes-during-binary-upgrad.patchDownload+3-1
On 24 Aug 2021, at 16:40, Denis Laxalde <denis.laxalde@dalibo.com> wrote:
Please find attached another patch implementing this suggestion (as a complement to the previous patch setting default_transaction_read_only).
Please add this to the upcoming commitfest to make sure it's not missed:
https://commitfest.postgresql.org/34/
--
Daniel Gustafsson https://vmware.com/
On Wed, Jan 27, 2021 at 03:06:46PM +0100, Jehan-Guillaume de Rorthais wrote:
Maybe it might worth adding some final safety checks in pg_upgrade itself?
Eg. checking controldata and mxid files coherency between old and new
cluster would have catch the inconsistency here.
Yeah, I agree that there are things in this area that could be
better.
--
Michael
On Tue, Aug 24, 2021 at 04:40:02PM +0200, Denis Laxalde wrote:
Julien Rouhaud a écrit :
I don't think that it would add that much complexity or overhead as there's
already all the infrastructure to prevent WAL writes in certain condition. It
should be enough to add an additional test in XLogInsertAllowed() with some new
variable that is set when starting in binary upgrade mode, and a new function
to disable it that will be emitted by pg_dump / pg_dumpall in binary upgrade
mode.This part is less clear to me so I'm not sure I'd be able to work on it.
default_transaction_read_only brings in a certain level of safety, but
it is limited when it comes to operations involving maintenance like a
REINDEX or a VACUUM code path. Making use of a different way to
control if WAL should be allowed for binary upgrades with a new mean
looks like a more promising and more robust approach, even if that
means that any bgworkers started by the clusters on which the upgrade
is done would need to deal with any errors generated by this new
facility.
Saying that, I don't see a scenario where we'd need a bgworker to be
around during an upgrade. But perhaps some cloud providers have this
need in their own golden garden?
@@ -5862,6 +5862,9 @@ do_start_bgworker(RegisteredBgWorker *rw) static bool bgworker_should_start_now(BgWorkerStartTime start_time) { + if (IsBinaryUpgrade) + return false; +
Using -c max_worker_processes=0 would just have the same effect, no?
So we could just patch pg_upgrade's server.c to get the same level of
protection?
--
Michael
Michael Paquier a �crit�:
@@ -5862,6 +5862,9 @@ do_start_bgworker(RegisteredBgWorker *rw) static bool bgworker_should_start_now(BgWorkerStartTime start_time) { + if (IsBinaryUpgrade) + return false; +Using -c max_worker_processes=0 would just have the same effect, no?
So we could just patch pg_upgrade's server.c to get the same level of
protection?
Yes, same effect indeed. This would log "too many background workers"
messages in pg_upgrade logs, though.
See attached patch implementing this suggestion.
Attachments:
0001-Disable-bgworkers-at-servers-start-in-pg_upgrade.patchtext/x-patch; charset=UTF-8; name=0001-Disable-bgworkers-at-servers-start-in-pg_upgrade.patchDownload+4-2
On Thu, Aug 26, 2021 at 3:15 PM Denis Laxalde <denis.laxalde@dalibo.com> wrote:
Michael Paquier a écrit :
@@ -5862,6 +5862,9 @@ do_start_bgworker(RegisteredBgWorker *rw) static bool bgworker_should_start_now(BgWorkerStartTime start_time) { + if (IsBinaryUpgrade) + return false; +Using -c max_worker_processes=0 would just have the same effect, no?
So we could just patch pg_upgrade's server.c to get the same level of
protection?Yes, same effect indeed. This would log "too many background workers"
messages in pg_upgrade logs, though.
See attached patch implementing this suggestion.
I disagree. It can appear to have the same effect but it's not
guaranteed. Any module in shared_preload_libraries could stick a
"max_worker_processes +=X" if it thinks it should account for its own
ressources. That may not be something encouraged, but it's definitely
possible (and I think Andres recently mentioned that some extensions
do things like that, although maybe for other GUCs) and could result
in a corruption of a pg_upgrade'd cluster, so I still think that
changing bgworker_should_start_now() is a better option.
On Thu, Aug 26, 2021 at 03:24:33PM +0800, Julien Rouhaud wrote:
On Thu, Aug 26, 2021 at 3:15 PM Denis Laxalde <denis.laxalde@dalibo.com> wrote:
Michael Paquier a écrit :
@@ -5862,6 +5862,9 @@ do_start_bgworker(RegisteredBgWorker *rw) static bool bgworker_should_start_now(BgWorkerStartTime start_time) { + if (IsBinaryUpgrade) + return false; +Using -c max_worker_processes=0 would just have the same effect, no?
So we could just patch pg_upgrade's server.c to get the same level of
protection?Yes, same effect indeed. This would log "too many background workers"
messages in pg_upgrade logs, though.
See attached patch implementing this suggestion.I disagree. It can appear to have the same effect but it's not
guaranteed. Any module in shared_preload_libraries could stick a
"max_worker_processes +=X" if it thinks it should account for its own
ressources. That may not be something encouraged, but it's definitely
possible (and I think Andres recently mentioned that some extensions
do things like that, although maybe for other GUCs) and could result
in a corruption of a pg_upgrade'd cluster, so I still think that
changing bgworker_should_start_now() is a better option.
I am not sure. We have a lot of pg_upgrade code that turns off things
like autovacuum and that has worked fine:
snprintf(cmd, sizeof(cmd),
"\"%s/pg_ctl\" -w -l \"%s\" -D \"%s\" -o \"-p %d%s%s %s%s\" start",
cluster->bindir, SERVER_LOG_FILE, cluster->pgconfig, cluster->port,
(cluster->controldata.cat_ver >=
BINARY_UPGRADE_SERVER_FLAG_CAT_VER) ? " -b" :
" -c autovacuum=off -c autovacuum_freeze_max_age=2000000000",
(cluster == &new_cluster) ?
" -c synchronous_commit=off -c fsync=off -c full_page_writes=off -c vacuum_defer_cleanup_age=0" : "",
cluster->pgopts ? cluster->pgopts : "", socket_string);
Basically, pg_upgrade has avoided any backend changes that could be
controlled by GUCs and I am not sure we want to start adding such
changes for just this.
--
Bruce Momjian <bruce@momjian.us> https://momjian.us
EDB https://enterprisedb.com
If only the physical world exists, free will is an illusion.
On 26 Aug 2021, at 15:09, Bruce Momjian <bruce@momjian.us> wrote:
On Thu, Aug 26, 2021 at 03:24:33PM +0800, Julien Rouhaud wrote:
.. I still think that
changing bgworker_should_start_now() is a better option.I am not sure. We have a lot of pg_upgrade code that turns off things
like autovacuum and that has worked fine:
True, but there are also conditionals on IsBinaryUpgrade for starting the
autovacuum launcher in the postmaster, so there is some precedent.
Basically, pg_upgrade has avoided any backend changes that could be
controlled by GUCs and I am not sure we want to start adding such
changes for just this.
In principle I think it’s sound to try to avoid backend changes where possible
without sacrificing robustness.
--
Daniel Gustafsson https://vmware.com/
On Thu, Aug 26, 2021 at 03:38:23PM +0200, Daniel Gustafsson wrote:
On 26 Aug 2021, at 15:09, Bruce Momjian <bruce@momjian.us> wrote:
On Thu, Aug 26, 2021 at 03:24:33PM +0800, Julien Rouhaud wrote:.. I still think that
changing bgworker_should_start_now() is a better option.I am not sure. We have a lot of pg_upgrade code that turns off things
like autovacuum and that has worked fine:True, but there are also conditionals on IsBinaryUpgrade for starting the
autovacuum launcher in the postmaster, so there is some precedent.
Oh, I was not aware of that.
Basically, pg_upgrade has avoided any backend changes that could be
controlled by GUCs and I am not sure we want to start adding such
changes for just this.In principle I think it’s sound to try to avoid backend changes where possible
without sacrificing robustness.
Agreed.
--
Bruce Momjian <bruce@momjian.us> https://momjian.us
EDB https://enterprisedb.com
If only the physical world exists, free will is an illusion.
Le jeu. 26 août 2021 à 21:38, Daniel Gustafsson <daniel@yesql.se> a écrit :
On 26 Aug 2021, at 15:09, Bruce Momjian <bruce@momjian.us> wrote:
Basically, pg_upgrade has avoided any backend changes that could be
controlled by GUCs and I am not sure we want to start adding such
changes for just this.In principle I think it’s sound to try to avoid backend changes where
possible
without sacrificing robustness.
I agree, but it seems quite more likely that an extension relying on a
bgworker changes this guc, compared to an extension forcing autovacuum to
be on for instance.
Show quoted text
On 26 Aug 2021, at 15:43, Julien Rouhaud <rjuju123@gmail.com> wrote:
Le jeu. 26 août 2021 à 21:38, Daniel Gustafsson <daniel@yesql.se <mailto:daniel@yesql.se>> a écrit :
On 26 Aug 2021, at 15:09, Bruce Momjian <bruce@momjian.us <mailto:bruce@momjian.us>> wrote:
Basically, pg_upgrade has avoided any backend changes that could be
controlled by GUCs and I am not sure we want to start adding such
changes for just this.In principle I think it’s sound to try to avoid backend changes where possible
without sacrificing robustness.I agree, but it seems quite more likely that an extension relying on a bgworker changes this guc, compared to an extension forcing autovacuum to be on for instance.
Agreed, in this particular case I think there is merit to the idea of enforcing
it in the backend.
--
Daniel Gustafsson https://vmware.com/
Bruce Momjian a écrit :
On Thu, Aug 26, 2021 at 03:38:23PM +0200, Daniel Gustafsson wrote:
On 26 Aug 2021, at 15:09, Bruce Momjian<bruce@momjian.us> wrote:
On Thu, Aug 26, 2021 at 03:24:33PM +0800, Julien Rouhaud wrote:.. I still think that
changing bgworker_should_start_now() is a better option.I am not sure. We have a lot of pg_upgrade code that turns off things
like autovacuum and that has worked fine:True, but there are also conditionals on IsBinaryUpgrade for starting the
autovacuum launcher in the postmaster, so there is some precedent.Oh, I was not aware of that.
If I understand correctly, autovacuum is turned off by pg_upgrade code
only if the old cluster does not support the -b flag (prior to 9.1
apparently). Otherwise, this is indeed handled by IsBinaryUpgrade.
On Thu, Aug 26, 2021 at 03:59:49PM +0200, Daniel Gustafsson wrote:
On 26 Aug 2021, at 15:43, Julien Rouhaud <rjuju123@gmail.com> wrote:
Le jeu. 26 août 2021 à 21:38, Daniel Gustafsson <daniel@yesql.se <mailto:daniel@yesql.se>> a écrit :
On 26 Aug 2021, at 15:09, Bruce Momjian <bruce@momjian.us <mailto:bruce@momjian.us>> wrote:
Basically, pg_upgrade has avoided any backend changes that could be
controlled by GUCs and I am not sure we want to start adding such
changes for just this.In principle I think it’s sound to try to avoid backend changes where possible
without sacrificing robustness.I agree, but it seems quite more likely that an extension relying on a bgworker changes this guc, compared to an extension forcing autovacuum to be on for instance.
Agreed, in this particular case I think there is merit to the idea of enforcing
it in the backend.
OK, works for me
--
Bruce Momjian <bruce@momjian.us> https://momjian.us
EDB https://enterprisedb.com
If only the physical world exists, free will is an illusion.
On Thu, Aug 26, 2021 at 10:34:48AM -0400, Bruce Momjian wrote:
On Thu, Aug 26, 2021 at 03:59:49PM +0200, Daniel Gustafsson wrote:
Agreed, in this particular case I think there is merit to the idea of enforcing
it in the backend.OK, works for me
Indeed, there is some history here with autovacuum. I have not been
careful enough to check that. Still, putting a check on
IsBinaryUpgrade in bgworker_should_start_now() would mean that we
still keep track of the set of bgworkers registered in shared memory.
Wouldn't it be better to block things at the source, as of
RegisterBackgroundWorker()? And that would keep track of the control
we have on bgworkers in a single place. I also think that we'd better
document something about that either in bgworker.sgml or pg_upgrade's
page.
--
Michael