[PATCH] Disable bgworkers during servers start in pg_upgrade

Started by Denis Laxaldealmost 5 years ago39 messages

denis.laxalde@dalibo.com

almost 5 years ago

1 attachment(s)

Hello,

We found an issue in pg_upgrade on a cluster with a third-party
background worker. The upgrade goes fine, but the new cluster is then in
an inconsistent state. The background worker comes from the PoWA
extension but the issue does not appear to related to this particular
code.

Here is a shell script to reproduce the issue (error at the end):

OLDBINDIR=/usr/lib/postgresql/11/bin
NEWBINDIR=/usr/lib/postgresql/13/bin

OLDDATADIR=$(mktemp -d)
NEWDATADIR=$(mktemp -d)

$OLDBINDIR/initdb -D $OLDDATADIR
echo "unix_socket_directories = '/tmp'" >> $OLDDATADIR/postgresql.auto.conf
echo "shared_preload_libraries = 'pg_stat_statements, powa'" >> $OLDDATADIR/postgresql.auto.conf
$OLDBINDIR/pg_ctl -D $OLDDATADIR -l $OLDDATADIR/pgsql.log start
$OLDBINDIR/createdb -h /tmp powa
$OLDBINDIR/psql -h /tmp -d powa -c "CREATE EXTENSION powa CASCADE"
$OLDBINDIR/pg_ctl -D $OLDDATADIR -m fast stop

$NEWBINDIR/initdb -D $NEWDATADIR
cp $OLDDATADIR/postgresql.auto.conf $NEWDATADIR/postgresql.auto.conf

$NEWBINDIR/pg_upgrade --old-datadir $OLDDATADIR --new-datadir $NEWDATADIR --old-bindir $OLDBINDIR

$NEWBINDIR/pg_ctl -D $NEWDATADIR -l $NEWDATADIR/pgsql.log start
$NEWBINDIR/psql -h /tmp -d powa -c "SELECT 1 FROM powa_snapshot_metas"
# ERROR: MultiXactId 1 has not been created yet -- apparent wraparound

(This needs PoWA to be installed; packages are available on pgdg
repositories as postgresql-<pgversion>-powa on Debian or
powa_<pgversion> on RedHat or directly from source code at
https://github.com/powa-team/powa-archivist).

As far as I currently understand, this is due to the data to be migrated
being somewhat inconsistent (from the perspective of pg_upgrade) when
the old cluster and its background workers get started in pg_upgrade
during the "checks" step. (The old cluster remains sane, still.)

As a solution, it seems that, for similar reasons that we restrict
socket access to prevent accidental connections (from commit
f763b77193), we should also prevent background workers to start at this
step.

Please find attached a patch implementing this.

Thanks for considering,
Denis

Attachments:

0001-Disable-background-workers-during-servers-start-in-p.patchtext/x-diff; charset=us-asciiDownload

From 31b1f31cd3a822d23ccd5883120a013891ade0f3 Mon Sep 17 00:00:00 2001
From: Denis Laxalde <denis.laxalde@dalibo.com>
Date: Wed, 20 Jan 2021 17:25:58 +0100
Subject: [PATCH] Disable background workers during servers start in pg_upgrade

We disable shared_preload_libraries to prevent background workers to
initialize and start during server start in pg_upgrade.

In essence, this is for a similar reason that we use a restricted socket
access from f763b77193b04eba03a1f4ce46df34dc0348419e because background
workers may produce undesired activities during the upgrade.

Author: Denis Laxalde <denis.laxalde@dalibo.com>
Co-authored-by: Jehan-Guillaume de Rorthais <jgdr@dalibo.com>
---
 src/bin/pg_upgrade/server.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/src/bin/pg_upgrade/server.c b/src/bin/pg_upgrade/server.c
index 31b1425202..fab95a2d24 100644
--- a/src/bin/pg_upgrade/server.c
+++ b/src/bin/pg_upgrade/server.c
@@ -240,11 +240,13 @@ start_postmaster(ClusterInfo *cluster, bool report_and_exit_on_error)
 	 * crash, the new cluster has to be recreated anyway.  fsync=off is a big
 	 * win on ext4.
 	 *
+	 * Turn off background workers by emptying shared_preload_libraries.
+	 *
 	 * Force vacuum_defer_cleanup_age to 0 on the new cluster, so that
 	 * vacuumdb --freeze actually freezes the tuples.
 	 */
 	snprintf(cmd, sizeof(cmd),
-			 "\"%s/pg_ctl\" -w -l \"%s\" -D \"%s\" -o \"-p %d%s%s %s%s\" start",
+			 "\"%s/pg_ctl\" -w -l \"%s\" -D \"%s\" -o \"-p %d%s%s %s%s -c shared_preload_libraries=''\" start",
 			 cluster->bindir, SERVER_LOG_FILE, cluster->pgconfig, cluster->port,
 			 (cluster->controldata.cat_ver >=
 			  BINARY_UPGRADE_SERVER_FLAG_CAT_VER) ? " -b" :
-- 
2.20.1

Andres Freund

andres@anarazel.de

almost 5 years ago

In reply to: Denis Laxalde (#1)

Re: [PATCH] Disable bgworkers during servers start in pg_upgrade

Hi,

On 2021-01-21 16:23:58 +0100, Denis Laxalde wrote:

We found an issue in pg_upgrade on a cluster with a third-party
background worker. The upgrade goes fine, but the new cluster is then in
an inconsistent state. The background worker comes from the PoWA
extension but the issue does not appear to related to this particular
code.

Well, it does imply that that backgrounder did something, as the pure
existence of a bgworker shouldn't affect

anything. Presumably the issue is that the bgworker actually does
transactional writes, which causes problems because the xids /
multixacts from early during pg_upgrade won't actually be valid after we
do pg_resetxlog etc.

As a solution, it seems that, for similar reasons that we restrict
socket access to prevent accidental connections (from commit
f763b77193), we should also prevent background workers to start at this
step.

I think that'd have quite the potential for negative impact - imagine
extensions that refuse to be loaded outside of shared_preload_libraries
(e.g. because they need to allocate shared memory) but that are required
during the course of pg_upgrade (e.g. because it's a tableam, a PL or
such). Those libraries will then tried to be loaded during the upgrade
(due to the _PG_init() hook being called when functions from the
extension are needed, e.g. the tableam or PL handler).

Nor is it clear to me that the only way this would be problematic is
with shared_preload_libraries. A library in local_preload_libraries, or
a demand loaded library can trigger bgworkers (or database writes in
some other form) as well.

I wonder if we could

a) set default_transaction_read_only to true, and explicitly change it
in the sessions that need that.
b) when in binary upgrade mode / -b, error out on all wal writes in
sessions that don't explicitly set a session-level GUC to allow
writes.

Greetings,

Andres Freund

Denis Laxalde

denis.laxalde@dalibo.com

almost 5 years ago

In reply to: Andres Freund (#2)

1 attachment(s)

Re: [PATCH] Disable bgworkers during servers start in pg_upgrade

Hi,

Andres Freund a ï¿½crit :

On 2021-01-21 16:23:58 +0100, Denis Laxalde wrote:

We found an issue in pg_upgrade on a cluster with a third-party
background worker. The upgrade goes fine, but the new cluster is then in
an inconsistent state. The background worker comes from the PoWA
extension but the issue does not appear to related to this particular
code.

Well, it does imply that that backgrounder did something, as the pure
existence of a bgworker shouldn't affect

anything. Presumably the issue is that the bgworker actually does
transactional writes, which causes problems because the xids /
multixacts from early during pg_upgrade won't actually be valid after we
do pg_resetxlog etc.

As a solution, it seems that, for similar reasons that we restrict
socket access to prevent accidental connections (from commit
f763b77193), we should also prevent background workers to start at this
step.

I think that'd have quite the potential for negative impact - imagine
extensions that refuse to be loaded outside of shared_preload_libraries
(e.g. because they need to allocate shared memory) but that are required
during the course of pg_upgrade (e.g. because it's a tableam, a PL or
such). Those libraries will then tried to be loaded during the upgrade
(due to the _PG_init() hook being called when functions from the
extension are needed, e.g. the tableam or PL handler).

Nor is it clear to me that the only way this would be problematic is
with shared_preload_libraries. A library in local_preload_libraries, or
a demand loaded library can trigger bgworkers (or database writes in
some other form) as well.

Thank you for those insights!

I wonder if we could

a) set default_transaction_read_only to true, and explicitly change it
in the sessions that need that.
b) when in binary upgrade mode / -b, error out on all wal writes in
sessions that don't explicitly set a session-level GUC to allow
writes.

Solution "a" appears to be enough to solve the problem described in my
initial email. See attached patch implementing this.

Cheers,
Denis

Attachments:

0001-Set-default-transactions-to-read-only-at-servers-sta.patchtext/x-diff; charset=us-asciiDownload

From d8b74f9220775736917b7fc08bbe397d7e2eedcd Mon Sep 17 00:00:00 2001
From: Denis Laxalde <denis.laxalde@dalibo.com>
Date: Wed, 20 Jan 2021 17:25:58 +0100
Subject: [PATCH] Set default transactions to read-only at servers start in
 pg_upgrade

In essence, this is for a similar reason that we use a restricted socket
access from f763b77193b04eba03a1f4ce46df34dc0348419e because background
workers may produce undesired activities during the upgrade.

Author: Denis Laxalde <denis.laxalde@dalibo.com>
Co-authored-by: Jehan-Guillaume de Rorthais <jgdr@dalibo.com>
---
 src/bin/pg_upgrade/server.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/bin/pg_upgrade/server.c b/src/bin/pg_upgrade/server.c
index 31b1425202..b17f348a5b 100644
--- a/src/bin/pg_upgrade/server.c
+++ b/src/bin/pg_upgrade/server.c
@@ -244,7 +244,7 @@ start_postmaster(ClusterInfo *cluster, bool report_and_exit_on_error)
 	 * vacuumdb --freeze actually freezes the tuples.
 	 */
 	snprintf(cmd, sizeof(cmd),
-			 "\"%s/pg_ctl\" -w -l \"%s\" -D \"%s\" -o \"-p %d%s%s %s%s\" start",
+			 "\"%s/pg_ctl\" -w -l \"%s\" -D \"%s\" -o \"-p %d%s%s %s%s -c default_transaction_read_only=on\" start",
 			 cluster->bindir, SERVER_LOG_FILE, cluster->pgconfig, cluster->port,
 			 (cluster->controldata.cat_ver >=
 			  BINARY_UPGRADE_SERVER_FLAG_CAT_VER) ? " -b" :
-- 
2.20.1

Jehan-Guillaume de Rorthais

jgdr@dalibo.com

almost 5 years ago

In reply to: Denis Laxalde (#3)

Re: [PATCH] Disable bgworkers during servers start in pg_upgrade

Hi,

On Wed, 27 Jan 2021 11:25:11 +0100
Denis Laxalde <denis.laxalde@dalibo.com> wrote:

Andres Freund a écrit :

On 2021-01-21 16:23:58 +0100, Denis Laxalde wrote:

We found an issue in pg_upgrade on a cluster with a third-party
background worker. The upgrade goes fine, but the new cluster is then in
an inconsistent state. The background worker comes from the PoWA
extension but the issue does not appear to related to this particular
code.

Well, it does imply that that backgrounder did something, as the pure
existence of a bgworker shouldn't affect anything. Presumably the issue is
that the bgworker actually does transactional writes, which causes problems
because the xids / multixacts from early during pg_upgrade won't actually
be valid after we do pg_resetxlog etc.

Indeed, it does some writes. As soon as the powa bgworker starts, it takes
"snapshots" of pg_stat_statements state and record them in a local table. To
avoid concurrent run, it takes a lock on some of its local rows using SELECT FOR
UPDATE, hence the mxid consumption.

The inconsistency occurs at least at two place:

* the datminmxid and relminmxid fields pg_dump(all)'ed and restored in the new
cluster.
* the multixid fields in the controlfile read during the check phase and
restored later using pg_resetxlog.

As a solution, it seems that, for similar reasons that we restrict
socket access to prevent accidental connections (from commit
f763b77193), we should also prevent background workers to start at this
step.

I think that'd have quite the potential for negative impact - [...]

Thank you for those insights!

I wonder if we could

a) set default_transaction_read_only to true, and explicitly change it
in the sessions that need that.

According to Denis' tests discussed off-list, it works fine in regard with the
powa bgworker, albeit some complaints in logs. However, I wonder how fragile it
could be as bgworker could easily manipulate either the GUC or "BEGIN READ
WRITE". I realize this is really uncommon practices, but bgworker code from
third parties might be surprising.

b) when in binary upgrade mode / -b, error out on all wal writes in
sessions that don't explicitly set a session-level GUC to allow
writes.

It feels safer because more specific to the subject. But I wonder if the
benefice worth adding some (limited?) complexity and a small/quick conditional
block in a very hot code path for a very limited use case.

What about c) where the bgworker are not loaded by default during binary upgrade
mode / -b unless they explicitly set a bgw_flags (BGWORKER_BINARY_UPGRADE ?)
when they are required during pg_upgrade?

Regards,

Jehan-Guillaume de Rorthais

jgdr@dalibo.com

almost 5 years ago

In reply to: Jehan-Guillaume de Rorthais (#4)

Re: [PATCH] Disable bgworkers during servers start in pg_upgrade

Oh, I forgot another point before sending my previous email.

Maybe it might worth adding some final safety checks in pg_upgrade itself?
Eg. checking controldata and mxid files coherency between old and new
cluster would have catch the inconsistency here.

Julien Rouhaud

rjuju123@gmail.com

almost 5 years ago

In reply to: Jehan-Guillaume de Rorthais (#4)

Re: [PATCH] Disable bgworkers during servers start in pg_upgrade

On Wed, Jan 27, 2021 at 02:41:32PM +0100, Jehan-Guillaume de Rorthais wrote:

On Wed, 27 Jan 2021 11:25:11 +0100
Denis Laxalde <denis.laxalde@dalibo.com> wrote:

Andres Freund a ï¿½crit :

I wonder if we could

a) set default_transaction_read_only to true, and explicitly change it
in the sessions that need that.

According to Denis' tests discussed off-list, it works fine in regard with the
powa bgworker, albeit some complaints in logs. However, I wonder how fragile it
could be as bgworker could easily manipulate either the GUC or "BEGIN READ
WRITE". I realize this is really uncommon practices, but bgworker code from
third parties might be surprising.

Given that having any writes happening at the wrong moment on the old cluster
can end up corrupting the new cluster, and that the corruption might not be
immediately visible we should try to put as many safeguards as possible.

so +1 for the default_transaction_read_only as done in Denis' patch at minimum,
but not only.

AFAICT it should be easy to prevent all background worker from being launched
by adding a check on IsBinaryUpgrade at the beginning of
bgworker_should_start_now(). It won't prevent modules from being loaded, so
this approach should be problematic.

b) when in binary upgrade mode / -b, error out on all wal writes in
sessions that don't explicitly set a session-level GUC to allow
writes.

It feels safer because more specific to the subject. But I wonder if the
benefice worth adding some (limited?) complexity and a small/quick conditional
block in a very hot code path for a very limited use case.

I don't think that it would add that much complexity or overhead as there's
already all the infrastructure to prevent WAL writes in certain condition. It
should be enough to add an additional test in XLogInsertAllowed() with some new
variable that is set when starting in binary upgrade mode, and a new function
to disable it that will be emitted by pg_dump / pg_dumpall in binary upgrade
mode.

What about c) where the bgworker are not loaded by default during binary upgrade
mode / -b unless they explicitly set a bgw_flags (BGWORKER_BINARY_UPGRADE ?)
when they are required during pg_upgrade?

As mentioned above +1 for not launching the bgworkers. Does anyone can think
of a reason why some bgworker would really be needed during pg_upgrade, either
on the source or the target cluster?

Denis Laxalde

denis.laxalde@dalibo.com

over 4 years ago

In reply to: Julien Rouhaud (#6)

1 attachment(s)

Re: [PATCH] Disable bgworkers during servers start in pg_upgrade

Julien Rouhaud a écrit :

On Wed, Jan 27, 2021 at 02:41:32PM +0100, Jehan-Guillaume de Rorthais wrote:

On Wed, 27 Jan 2021 11:25:11 +0100
Denis Laxalde <denis.laxalde@dalibo.com> wrote:

Andres Freund a écrit :

I wonder if we could

a) set default_transaction_read_only to true, and explicitly change it
in the sessions that need that.

According to Denis' tests discussed off-list, it works fine in regard with the
powa bgworker, albeit some complaints in logs. However, I wonder how fragile it
could be as bgworker could easily manipulate either the GUC or "BEGIN READ
WRITE". I realize this is really uncommon practices, but bgworker code from
third parties might be surprising.

Given that having any writes happening at the wrong moment on the old cluster
can end up corrupting the new cluster, and that the corruption might not be
immediately visible we should try to put as many safeguards as possible.

so +1 for the default_transaction_read_only as done in Denis' patch at minimum,
but not only.

AFAICT it should be easy to prevent all background worker from being launched
by adding a check on IsBinaryUpgrade at the beginning of
bgworker_should_start_now(). It won't prevent modules from being loaded, so
this approach should be problematic.

Please find attached another patch implementing this suggestion (as a
complement to the previous patch setting default_transaction_read_only).

b) when in binary upgrade mode / -b, error out on all wal writes in
sessions that don't explicitly set a session-level GUC to allow
writes.

It feels safer because more specific to the subject. But I wonder if the
benefice worth adding some (limited?) complexity and a small/quick conditional
block in a very hot code path for a very limited use case.

I don't think that it would add that much complexity or overhead as there's
already all the infrastructure to prevent WAL writes in certain condition. It
should be enough to add an additional test in XLogInsertAllowed() with some new
variable that is set when starting in binary upgrade mode, and a new function
to disable it that will be emitted by pg_dump / pg_dumpall in binary upgrade
mode.

This part is less clear to me so I'm not sure I'd be able to work on it.

Attachments:

0001-Do-not-start-bgworker-processes-during-binary-upgrad.patchtext/x-patch; charset=UTF-8; name=0001-Do-not-start-bgworker-processes-during-binary-upgrad.patchDownload

From b265b6f5b49cfcbc3c6271e980455696a5a3622b Mon Sep 17 00:00:00 2001
From: Denis Laxalde <denis.laxalde@dalibo.com>
Date: Mon, 23 Aug 2021 15:19:41 +0200
Subject: [PATCH] Do not start bgworker processes during binary upgrade

Background workers may produce undesired activities (writes) on the old
cluster during upgrade which may corrupt the new cluster. For a similar
reason that we restrict socket access and set default transactions to
read-only when starting the old cluster in pg_upgrade, we should prevent
bgworkers from starting when using the -b flag.
---
 src/backend/postmaster/postmaster.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
index 9c2c98614a..e66c962509 100644
--- a/src/backend/postmaster/postmaster.c
+++ b/src/backend/postmaster/postmaster.c
@@ -5862,6 +5862,9 @@ do_start_bgworker(RegisteredBgWorker *rw)
 static bool
 bgworker_should_start_now(BgWorkerStartTime start_time)
 {
+	if (IsBinaryUpgrade)
+		return false;
+
 	switch (pmState)
 	{
 		case PM_NO_CHILDREN:
-- 
2.30.2

Daniel Gustafsson

daniel@yesql.se

over 4 years ago

In reply to: Denis Laxalde (#7)

Re: [PATCH] Disable bgworkers during servers start in pg_upgrade

On 24 Aug 2021, at 16:40, Denis Laxalde <denis.laxalde@dalibo.com> wrote:

Please find attached another patch implementing this suggestion (as a complement to the previous patch setting default_transaction_read_only).

Please add this to the upcoming commitfest to make sure it's not missed:

https://commitfest.postgresql.org/34/

--
Daniel Gustafsson https://vmware.com/

Michael Paquier

michael@paquier.xyz

over 4 years ago

In reply to: Jehan-Guillaume de Rorthais (#5)

Re: [PATCH] Disable bgworkers during servers start in pg_upgrade

On Wed, Jan 27, 2021 at 03:06:46PM +0100, Jehan-Guillaume de Rorthais wrote:

Maybe it might worth adding some final safety checks in pg_upgrade itself?
Eg. checking controldata and mxid files coherency between old and new
cluster would have catch the inconsistency here.

Yeah, I agree that there are things in this area that could be
better.
--
Michael

#10

Michael Paquier

michael@paquier.xyz

over 4 years ago

In reply to: Denis Laxalde (#7)

Re: [PATCH] Disable bgworkers during servers start in pg_upgrade

On Tue, Aug 24, 2021 at 04:40:02PM +0200, Denis Laxalde wrote:

Julien Rouhaud a écrit :

I don't think that it would add that much complexity or overhead as there's
already all the infrastructure to prevent WAL writes in certain condition. It
should be enough to add an additional test in XLogInsertAllowed() with some new
variable that is set when starting in binary upgrade mode, and a new function
to disable it that will be emitted by pg_dump / pg_dumpall in binary upgrade
mode.

This part is less clear to me so I'm not sure I'd be able to work on it.

default_transaction_read_only brings in a certain level of safety, but
it is limited when it comes to operations involving maintenance like a
REINDEX or a VACUUM code path. Making use of a different way to
control if WAL should be allowed for binary upgrades with a new mean
looks like a more promising and more robust approach, even if that
means that any bgworkers started by the clusters on which the upgrade
is done would need to deal with any errors generated by this new
facility.

Saying that, I don't see a scenario where we'd need a bgworker to be
around during an upgrade. But perhaps some cloud providers have this
need in their own golden garden?

@@ -5862,6 +5862,9 @@ do_start_bgworker(RegisteredBgWorker *rw)
static bool
bgworker_should_start_now(BgWorkerStartTime start_time)
{
+	if (IsBinaryUpgrade)
+		return false;
+

Using -c max_worker_processes=0 would just have the same effect, no?
So we could just patch pg_upgrade's server.c to get the same level of
protection?
--
Michael

#11

Denis Laxalde

denis.laxalde@dalibo.com

over 4 years ago

In reply to: Michael Paquier (#10)

1 attachment(s)

Re: [PATCH] Disable bgworkers during servers start in pg_upgrade

Michael Paquier a ï¿½critï¿½:

@@ -5862,6 +5862,9 @@ do_start_bgworker(RegisteredBgWorker *rw)
static bool
bgworker_should_start_now(BgWorkerStartTime start_time)
{
+	if (IsBinaryUpgrade)
+		return false;
+
Using -c max_worker_processes=0 would just have the same effect, no?
So we could just patch pg_upgrade's server.c to get the same level of
protection?

Yes, same effect indeed. This would log "too many background workers"
messages in pg_upgrade logs, though.
See attached patch implementing this suggestion.

Attachments:

0001-Disable-bgworkers-at-servers-start-in-pg_upgrade.patchtext/x-patch; charset=UTF-8; name=0001-Disable-bgworkers-at-servers-start-in-pg_upgrade.patchDownload

From 04867b2184c335e6efaf5a96d348a53efcc9e3d2 Mon Sep 17 00:00:00 2001
From: Denis Laxalde <denis.laxalde@dalibo.com>
Date: Mon, 23 Aug 2021 15:19:41 +0200
Subject: [PATCH] Disable bgworkers at servers start in pg_upgrade

Background workers may produce undesired activities (writes) on the old
cluster during upgrade which may corrupt the new cluster.
---
 src/bin/pg_upgrade/server.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/src/bin/pg_upgrade/server.c b/src/bin/pg_upgrade/server.c
index 7fed0ae108..a2b67a5e9a 100644
--- a/src/bin/pg_upgrade/server.c
+++ b/src/bin/pg_upgrade/server.c
@@ -236,6 +236,9 @@ start_postmaster(ClusterInfo *cluster, bool report_and_exit_on_error)
 	 * values are less than a gap of 2000000000 from the current xid counter,
 	 * so autovacuum will not touch them.
 	 *
+	 * Disable background workers by setting max_worker_processes=0 to prevent
+	 * undesired writes which may cause corruptions on the new cluster.
+	 *
 	 * Turn off durability requirements to improve object creation speed, and
 	 * we only modify the new cluster, so only use it there.  If there is a
 	 * crash, the new cluster has to be recreated anyway.  fsync=off is a big
@@ -245,7 +248,7 @@ start_postmaster(ClusterInfo *cluster, bool report_and_exit_on_error)
 	 * vacuumdb --freeze actually freezes the tuples.
 	 */
 	snprintf(cmd, sizeof(cmd),
-			 "\"%s/pg_ctl\" -w -l \"%s\" -D \"%s\" -o \"-p %d%s%s %s%s\" start",
+			 "\"%s/pg_ctl\" -w -l \"%s\" -D \"%s\" -o \"-p %d%s%s %s%s -c max_worker_processes=0\" start",
 			 cluster->bindir, SERVER_LOG_FILE, cluster->pgconfig, cluster->port,
 			 (cluster->controldata.cat_ver >=
 			  BINARY_UPGRADE_SERVER_FLAG_CAT_VER) ? " -b" :
-- 
2.30.2

#12

Julien Rouhaud

rjuju123@gmail.com

over 4 years ago

In reply to: Denis Laxalde (#11)

Re: [PATCH] Disable bgworkers during servers start in pg_upgrade

On Thu, Aug 26, 2021 at 3:15 PM Denis Laxalde <denis.laxalde@dalibo.com> wrote:

Michael Paquier a écrit :
@@ -5862,6 +5862,9 @@ do_start_bgworker(RegisteredBgWorker *rw)
static bool
bgworker_should_start_now(BgWorkerStartTime start_time)
{
+    if (IsBinaryUpgrade)
+            return false;
+
Using -c max_worker_processes=0 would just have the same effect, no?
So we could just patch pg_upgrade's server.c to get the same level of
protection?
Yes, same effect indeed. This would log "too many background workers"
messages in pg_upgrade logs, though.
See attached patch implementing this suggestion.

I disagree. It can appear to have the same effect but it's not
guaranteed. Any module in shared_preload_libraries could stick a
"max_worker_processes +=X" if it thinks it should account for its own
ressources. That may not be something encouraged, but it's definitely
possible (and I think Andres recently mentioned that some extensions
do things like that, although maybe for other GUCs) and could result
in a corruption of a pg_upgrade'd cluster, so I still think that
changing bgworker_should_start_now() is a better option.

#13

Bruce Momjian

bruce@momjian.us

over 4 years ago

In reply to: Julien Rouhaud (#12)

Re: [PATCH] Disable bgworkers during servers start in pg_upgrade

On Thu, Aug 26, 2021 at 03:24:33PM +0800, Julien Rouhaud wrote:

On Thu, Aug 26, 2021 at 3:15 PM Denis Laxalde <denis.laxalde@dalibo.com> wrote:
Michael Paquier a écrit :
@@ -5862,6 +5862,9 @@ do_start_bgworker(RegisteredBgWorker *rw)
static bool
bgworker_should_start_now(BgWorkerStartTime start_time)
{
+    if (IsBinaryUpgrade)
+            return false;
+
Using -c max_worker_processes=0 would just have the same effect, no?
So we could just patch pg_upgrade's server.c to get the same level of
protection?
Yes, same effect indeed. This would log "too many background workers"
messages in pg_upgrade logs, though.
See attached patch implementing this suggestion.
I disagree. It can appear to have the same effect but it's not
guaranteed. Any module in shared_preload_libraries could stick a
"max_worker_processes +=X" if it thinks it should account for its own
ressources. That may not be something encouraged, but it's definitely
possible (and I think Andres recently mentioned that some extensions
do things like that, although maybe for other GUCs) and could result
in a corruption of a pg_upgrade'd cluster, so I still think that
changing bgworker_should_start_now() is a better option.

I am not sure. We have a lot of pg_upgrade code that turns off things
like autovacuum and that has worked fine:

snprintf(cmd, sizeof(cmd),
"\"%s/pg_ctl\" -w -l \"%s\" -D \"%s\" -o \"-p %d%s%s %s%s\" start",
cluster->bindir, SERVER_LOG_FILE, cluster->pgconfig, cluster->port,
(cluster->controldata.cat_ver >=
BINARY_UPGRADE_SERVER_FLAG_CAT_VER) ? " -b" :
" -c autovacuum=off -c autovacuum_freeze_max_age=2000000000",
(cluster == &new_cluster) ?
" -c synchronous_commit=off -c fsync=off -c full_page_writes=off -c vacuum_defer_cleanup_age=0" : "",
cluster->pgopts ? cluster->pgopts : "", socket_string);

Basically, pg_upgrade has avoided any backend changes that could be
controlled by GUCs and I am not sure we want to start adding such
changes for just this.

--
Bruce Momjian <bruce@momjian.us> https://momjian.us
EDB https://enterprisedb.com

If only the physical world exists, free will is an illusion.

#14

Daniel Gustafsson

daniel@yesql.se

over 4 years ago

In reply to: Bruce Momjian (#13)

Re: [PATCH] Disable bgworkers during servers start in pg_upgrade

On 26 Aug 2021, at 15:09, Bruce Momjian <bruce@momjian.us> wrote:
On Thu, Aug 26, 2021 at 03:24:33PM +0800, Julien Rouhaud wrote:

.. I still think that
changing bgworker_should_start_now() is a better option.

I am not sure. We have a lot of pg_upgrade code that turns off things
like autovacuum and that has worked fine:

True, but there are also conditionals on IsBinaryUpgrade for starting the
autovacuum launcher in the postmaster, so there is some precedent.

Basically, pg_upgrade has avoided any backend changes that could be
controlled by GUCs and I am not sure we want to start adding such
changes for just this.

In principle I think it’s sound to try to avoid backend changes where possible
without sacrificing robustness.

--
Daniel Gustafsson https://vmware.com/

#15

Bruce Momjian

bruce@momjian.us

over 4 years ago

In reply to: Daniel Gustafsson (#14)

Re: [PATCH] Disable bgworkers during servers start in pg_upgrade

On Thu, Aug 26, 2021 at 03:38:23PM +0200, Daniel Gustafsson wrote:

On 26 Aug 2021, at 15:09, Bruce Momjian <bruce@momjian.us> wrote:
On Thu, Aug 26, 2021 at 03:24:33PM +0800, Julien Rouhaud wrote:

.. I still think that
changing bgworker_should_start_now() is a better option.

I am not sure. We have a lot of pg_upgrade code that turns off things
like autovacuum and that has worked fine:

True, but there are also conditionals on IsBinaryUpgrade for starting the
autovacuum launcher in the postmaster, so there is some precedent.

Oh, I was not aware of that.

Basically, pg_upgrade has avoided any backend changes that could be
controlled by GUCs and I am not sure we want to start adding such
changes for just this.

In principle I think it’s sound to try to avoid backend changes where possible
without sacrificing robustness.

Agreed.

--
Bruce Momjian <bruce@momjian.us> https://momjian.us
EDB https://enterprisedb.com

If only the physical world exists, free will is an illusion.

#16

Julien Rouhaud

rjuju123@gmail.com

over 4 years ago

In reply to: Daniel Gustafsson (#14)

Re: [PATCH] Disable bgworkers during servers start in pg_upgrade

Le jeu. 26 août 2021 à 21:38, Daniel Gustafsson <daniel@yesql.se> a écrit :

On 26 Aug 2021, at 15:09, Bruce Momjian <bruce@momjian.us> wrote:

Basically, pg_upgrade has avoided any backend changes that could be
controlled by GUCs and I am not sure we want to start adding such
changes for just this.

In principle I think it’s sound to try to avoid backend changes where
possible
without sacrificing robustness.

I agree, but it seems quite more likely that an extension relying on a
bgworker changes this guc, compared to an extension forcing autovacuum to
be on for instance.

Show quoted text

#17

Daniel Gustafsson

daniel@yesql.se

over 4 years ago

In reply to: Julien Rouhaud (#16)

Re: [PATCH] Disable bgworkers during servers start in pg_upgrade

On 26 Aug 2021, at 15:43, Julien Rouhaud <rjuju123@gmail.com> wrote:

Le jeu. 26 août 2021 à 21:38, Daniel Gustafsson <daniel@yesql.se <mailto:daniel@yesql.se>> a écrit :

On 26 Aug 2021, at 15:09, Bruce Momjian <bruce@momjian.us <mailto:bruce@momjian.us>> wrote:

Basically, pg_upgrade has avoided any backend changes that could be
controlled by GUCs and I am not sure we want to start adding such
changes for just this.

In principle I think it’s sound to try to avoid backend changes where possible
without sacrificing robustness.

I agree, but it seems quite more likely that an extension relying on a bgworker changes this guc, compared to an extension forcing autovacuum to be on for instance.

Agreed, in this particular case I think there is merit to the idea of enforcing
it in the backend.

--
Daniel Gustafsson https://vmware.com/

#18

Denis Laxalde

denis.laxalde@dalibo.com

over 4 years ago

In reply to: Bruce Momjian (#15)

Re: [PATCH] Disable bgworkers during servers start in pg_upgrade

Bruce Momjian a écrit :

On Thu, Aug 26, 2021 at 03:38:23PM +0200, Daniel Gustafsson wrote:

On 26 Aug 2021, at 15:09, Bruce Momjian<bruce@momjian.us> wrote:
On Thu, Aug 26, 2021 at 03:24:33PM +0800, Julien Rouhaud wrote:

.. I still think that
changing bgworker_should_start_now() is a better option.

I am not sure. We have a lot of pg_upgrade code that turns off things
like autovacuum and that has worked fine:

True, but there are also conditionals on IsBinaryUpgrade for starting the
autovacuum launcher in the postmaster, so there is some precedent.

Oh, I was not aware of that.

If I understand correctly, autovacuum is turned off by pg_upgrade code
only if the old cluster does not support the -b flag (prior to 9.1
apparently). Otherwise, this is indeed handled by IsBinaryUpgrade.

#19

Bruce Momjian

bruce@momjian.us

over 4 years ago

In reply to: Daniel Gustafsson (#17)

Re: [PATCH] Disable bgworkers during servers start in pg_upgrade

On Thu, Aug 26, 2021 at 03:59:49PM +0200, Daniel Gustafsson wrote:

On 26 Aug 2021, at 15:43, Julien Rouhaud <rjuju123@gmail.com> wrote:

Le jeu. 26 août 2021 à 21:38, Daniel Gustafsson <daniel@yesql.se <mailto:daniel@yesql.se>> a écrit :

On 26 Aug 2021, at 15:09, Bruce Momjian <bruce@momjian.us <mailto:bruce@momjian.us>> wrote:

Basically, pg_upgrade has avoided any backend changes that could be
controlled by GUCs and I am not sure we want to start adding such
changes for just this.

In principle I think it’s sound to try to avoid backend changes where possible
without sacrificing robustness.

I agree, but it seems quite more likely that an extension relying on a bgworker changes this guc, compared to an extension forcing autovacuum to be on for instance.

Agreed, in this particular case I think there is merit to the idea of enforcing
it in the backend.

OK, works for me

--
Bruce Momjian <bruce@momjian.us> https://momjian.us
EDB https://enterprisedb.com

If only the physical world exists, free will is an illusion.

#20

Michael Paquier

michael@paquier.xyz

over 4 years ago

In reply to: Bruce Momjian (#19)

Re: [PATCH] Disable bgworkers during servers start in pg_upgrade

On Thu, Aug 26, 2021 at 10:34:48AM -0400, Bruce Momjian wrote:

On Thu, Aug 26, 2021 at 03:59:49PM +0200, Daniel Gustafsson wrote:

Agreed, in this particular case I think there is merit to the idea of enforcing
it in the backend.

OK, works for me

Indeed, there is some history here with autovacuum. I have not been
careful enough to check that. Still, putting a check on
IsBinaryUpgrade in bgworker_should_start_now() would mean that we
still keep track of the set of bgworkers registered in shared memory.

Wouldn't it be better to block things at the source, as of
RegisterBackgroundWorker()? And that would keep track of the control
we have on bgworkers in a single place. I also think that we'd better
document something about that either in bgworker.sgml or pg_upgrade's
page.
--
Michael

#21

Julien Rouhaud

rjuju123@gmail.com

over 4 years ago

In reply to: Michael Paquier (#20)

Re: [PATCH] Disable bgworkers during servers start in pg_upgrade

On Fri, Aug 27, 2021 at 7:31 AM Michael Paquier <michael@paquier.xyz> wrote:

Indeed, there is some history here with autovacuum. I have not been
careful enough to check that. Still, putting a check on
IsBinaryUpgrade in bgworker_should_start_now() would mean that we
still keep track of the set of bgworkers registered in shared memory.

That shouldn't lead to any problem right?

Wouldn't it be better to block things at the source, as of
RegisterBackgroundWorker()? And that would keep track of the control
we have on bgworkers in a single place. I also think that we'd better
document something about that either in bgworker.sgml or pg_upgrade's
page.

I'm fine with that approach too.

#22

Michael Paquier

michael@paquier.xyz

over 4 years ago

In reply to: Julien Rouhaud (#21)

Re: [PATCH] Disable bgworkers during servers start in pg_upgrade

On Fri, Aug 27, 2021 at 09:34:24AM +0800, Julien Rouhaud wrote:

That shouldn't lead to any problem right?

Well, bgworker_should_start_now() does not exist for that, and
RegisterBackgroundWorker() is the one doing the classification job, so
it would be more consistent to keep everything under control in the
same code path.
--
Michael

#23

Julien Rouhaud

rjuju123@gmail.com

over 4 years ago

In reply to: Michael Paquier (#22)

Re: [PATCH] Disable bgworkers during servers start in pg_upgrade

On Fri, Aug 27, 2021 at 10:02 AM Michael Paquier <michael@paquier.xyz> wrote:

On Fri, Aug 27, 2021 at 09:34:24AM +0800, Julien Rouhaud wrote:

That shouldn't lead to any problem right?

Well, bgworker_should_start_now() does not exist for that, and
RegisterBackgroundWorker() is the one doing the classification job, so
it would be more consistent to keep everything under control in the
same code path.

I'm not sure it's that uncontroversial. The way I see
RegisterBackgroundWorker() is that it's responsible for doing some
sanity checks to see if the module didn't make any error and if
ressources are available. Surely checking for IsBinaryUpgrade should
not be the responsibility of third-party code, so the question is
whether binary upgrade is seen as a resource and as such a reason to
forbid bgworker registration, in opposition to forbid the launch
itself.

Right now AFAICT there's no official API to check if a call to
RegisterBackgroundWorker() succeeded or not, but an extension could
check by itself using BackgroundWorkerList in bgworker_internals.h,
and error out or something if it didn't succeed, as a way to inform
users that they didn't configure the instance properly (like maybe
increasing max_worker_processes). Surely using a *_internals.h header
is a clear sign that you expose yourself to problems, but adding an
official way to check for bgworker registration doesn't seem
unreasonable to me. Is that worth the risk to have pg_upgrade
erroring out in this kind of scenario, or make the addition of a new
API to check for registration status more difficult?

#24

Michael Paquier

michael@paquier.xyz

over 4 years ago

In reply to: Julien Rouhaud (#23)

Re: [PATCH] Disable bgworkers during servers start in pg_upgrade

On Fri, Aug 27, 2021 at 11:25:19AM +0800, Julien Rouhaud wrote:

Right now AFAICT there's no official API to check if a call to
RegisterBackgroundWorker() succeeded or not, but an extension could
check by itself using BackgroundWorkerList in bgworker_internals.h,
and error out or something if it didn't succeed, as a way to inform
users that they didn't configure the instance properly (like maybe
increasing max_worker_processes). Surely using a *_internals.h header
is a clear sign that you expose yourself to problems, but adding an
official way to check for bgworker registration doesn't seem
unreasonable to me. Is that worth the risk to have pg_upgrade
erroring out in this kind of scenario, or make the addition of a new
API to check for registration status more difficult?

Perhaps. That feels like a topic different than what's discussed
here, though, because we don't really need to check if a bgworker has
been launched or not. We just need to make sure that it never runs in
the context of a binary upgrade, like autovacuum.
--
Michael

#25

Julien Rouhaud

rjuju123@gmail.com

over 4 years ago

In reply to: Michael Paquier (#24)

Re: [PATCH] Disable bgworkers during servers start in pg_upgrade

On Fri, Aug 27, 2021 at 12:41 PM Michael Paquier <michael@paquier.xyz> wrote:

Perhaps. That feels like a topic different than what's discussed
here, though, because we don't really need to check if a bgworker has
been launched or not. We just need to make sure that it never runs in
the context of a binary upgrade, like autovacuum.

I'm a bit confused. Did you mean checking if a bgworker has been
*registered* or not?

But my point was that preventing a bgworker registration as a way to
avoid it from being launched may lead to some problem if an extensions
decides that a failure in the registration is problematic enough to
prevent the startup altogether for instance. I'm ok if we decide that
it's *not* an acceptable behavior, but it should be clear that it's
the case, and probably documented.

#26

Andres Freund

andres@anarazel.de

over 4 years ago

In reply to: Julien Rouhaud (#21)

Re: [PATCH] Disable bgworkers during servers start in pg_upgrade

Hi,

On 2021-08-27 09:34:24 +0800, Julien Rouhaud wrote:

On Fri, Aug 27, 2021 at 7:31 AM Michael Paquier <michael@paquier.xyz> wrote:

Indeed, there is some history here with autovacuum. I have not been
careful enough to check that. Still, putting a check on
IsBinaryUpgrade in bgworker_should_start_now() would mean that we
still keep track of the set of bgworkers registered in shared memory.

That shouldn't lead to any problem right?

Wouldn't it be better to block things at the source, as of
RegisterBackgroundWorker()? And that would keep track of the control
we have on bgworkers in a single place. I also think that we'd better
document something about that either in bgworker.sgml or pg_upgrade's
page.

I'm fine with that approach too.

Isn't that just going to end up with extension code erroring out and/or
blocking waiting for a bgworker to start?

Greetings,

Andres Freund

#27

Michael Paquier

michael@paquier.xyz

over 4 years ago

In reply to: Andres Freund (#26)

Re: [PATCH] Disable bgworkers during servers start in pg_upgrade

On Fri, Aug 27, 2021 at 12:28:42PM -0700, Andres Freund wrote:

Isn't that just going to end up with extension code erroring out and/or
blocking waiting for a bgworker to start?

Well, that's the point to block things during an upgrade. Do you have
a list of requirements you'd like to see satisfied here? POWA would
be fine with blocking the execution of bgworkers AFAIK (Julien feel
free to correct me here if necessary). It could be possible that
preventing extension code to execute this way could prevent hazards as
well. The idea from upthread to prevent any writes and/or WAL
activity is not really different as extensions may still generate an
error because of pg_upgrade's safety measures we'd put in, no?
--
Michael

#28

Julien Rouhaud

rjuju123@gmail.com

over 4 years ago

In reply to: Andres Freund (#26)

Re: [PATCH] Disable bgworkers during servers start in pg_upgrade

On Sat, Aug 28, 2021 at 3:28 AM Andres Freund <andres@anarazel.de> wrote:

On Fri, Aug 27, 2021 at 7:31 AM Michael Paquier <michael@paquier.xyz> wrote:

Wouldn't it be better to block things at the source, as of
RegisterBackgroundWorker()? And that would keep track of the control
we have on bgworkers in a single place. I also think that we'd better
document something about that either in bgworker.sgml or pg_upgrade's
page.

Isn't that just going to end up with extension code erroring out and/or
blocking waiting for a bgworker to start?

But there's no API to wait for the start of a non-dynamic bgworker or
check for it right? So I don't see how the extension code could wait
or error out. As far as I know the only thing you can do is
RegisterBackgroundWorker() in your _PG_init() code and hope that the
server is correctly configured. The only thing that third-party code
could I think is try to check if the bgworker could be successfully
registered or not as I mentioned in [1]/messages/by-id/CAOBaU_ZtR88x3Si6XwprqGo8UZNLncAQrD_-wc67sC=acO3g=w@mail.gmail.com. Maybe extra paranoid code
may add such check in all executor hook but the overhead would be so
terrible that no one would use such an extension anyway.

[1]: /messages/by-id/CAOBaU_ZtR88x3Si6XwprqGo8UZNLncAQrD_-wc67sC=acO3g=w@mail.gmail.com

#29

Julien Rouhaud

rjuju123@gmail.com

over 4 years ago

In reply to: Michael Paquier (#27)

Re: [PATCH] Disable bgworkers during servers start in pg_upgrade

On Sat, Aug 28, 2021 at 9:40 AM Michael Paquier <michael@paquier.xyz> wrote:

On Fri, Aug 27, 2021 at 12:28:42PM -0700, Andres Freund wrote:

Isn't that just going to end up with extension code erroring out and/or
blocking waiting for a bgworker to start?

Well, that's the point to block things during an upgrade. Do you have
a list of requirements you'd like to see satisfied here? POWA would
be fine with blocking the execution of bgworkers AFAIK (Julien feel
free to correct me here if necessary).

Yes, no problem at all, whether the bgworker isn't registered or never
launched. The bgworker isn't even mandatory anymore since a few
years, as we introduced an external daemon to collect metrics on a
distant database.

#30

Julien Rouhaud

rjuju123@gmail.com

about 4 years ago

In reply to: Michael Paquier (#27)

Re: [PATCH] Disable bgworkers during servers start in pg_upgrade

On Sat, Aug 28, 2021 at 10:40:42AM +0900, Michael Paquier wrote:

On Fri, Aug 27, 2021 at 12:28:42PM -0700, Andres Freund wrote:

Isn't that just going to end up with extension code erroring out and/or
blocking waiting for a bgworker to start?

Well, that's the point to block things during an upgrade. Do you have
a list of requirements you'd like to see satisfied here? POWA would
be fine with blocking the execution of bgworkers AFAIK (Julien feel
free to correct me here if necessary). It could be possible that
preventing extension code to execute this way could prevent hazards as
well. The idea from upthread to prevent any writes and/or WAL
activity is not really different as extensions may still generate an
error because of pg_upgrade's safety measures we'd put in, no?

This thread is now almost one year old, and AFAICT there's still no consensus
on how to fix this problem. It would be good to have something done in pg15,
ideally backpatched, as this is a corruption hazard that triggered at least
once already.

Andres, do you still have an objection with either preventing bgworker
registration/launch or WAL-write during the impacted pg_upgrade steps, or a
better alternative to fix the problem?

#31

Andres Freund

andres@anarazel.de

almost 4 years ago

In reply to: Julien Rouhaud (#30)

Re: [PATCH] Disable bgworkers during servers start in pg_upgrade

Hi,

On 2022-01-12 17:54:31 +0800, Julien Rouhaud wrote:

On Sat, Aug 28, 2021 at 10:40:42AM +0900, Michael Paquier wrote:

On Fri, Aug 27, 2021 at 12:28:42PM -0700, Andres Freund wrote:

Isn't that just going to end up with extension code erroring out and/or
blocking waiting for a bgworker to start?

Well, that's the point to block things during an upgrade. Do you have
a list of requirements you'd like to see satisfied here? POWA would
be fine with blocking the execution of bgworkers AFAIK (Julien feel
free to correct me here if necessary). It could be possible that
preventing extension code to execute this way could prevent hazards as
well. The idea from upthread to prevent any writes and/or WAL
activity is not really different as extensions may still generate an
error because of pg_upgrade's safety measures we'd put in, no?

The point is that we need the check for WAL writes / xid assignments / etc
*either* way. There are ways extensions could trigger problems like e.g. xid
assigned, besides bgworker doing stuff. Or postgres components doing so
unintentionally.

Erroring out in situation where we *know* that there were concurrent changes
unacceptable during pg_upgrade is always the right call. Such errors can be
debugged and then addressed (removing the extension from s_p_l, fixing the
extension, etc).

In contrast to that, preventing upgrades from succeeding because an extension
has a dependency on bgworkers working, just because the bgworker *could* be
doing something bad is different. The bgworker might never write, have a check
for binary upgrade mode, etc. It may not be realistic to fix and extension to
work without the bgworkers.

Imagine something like an table access method that has IO workers or such.

Andres, do you still have an objection with either preventing bgworker
registration/launch or WAL-write during the impacted pg_upgrade steps, or a
better alternative to fix the problem?

I still object to the approach of preventing bgworker registration. It doesn't
provide much protection and might cause hard to address problems for some
extensions.

I don't think I ever objected to preventing WAL-writes, I even proposed that
upthread? Unless you suggest doing it specifically in bgworkers, rather than
preventing similar problems outside bgworkers as well.

Greetings,

Andres Freund

#32

Julien Rouhaud

rjuju123@gmail.com

almost 4 years ago

In reply to: Andres Freund (#31)

Re: [PATCH] Disable bgworkers during servers start in pg_upgrade

Hi,

On Thu, Jan 13, 2022 at 06:44:12PM -0800, Andres Freund wrote:

The point is that we need the check for WAL writes / xid assignments / etc
*either* way. There are ways extensions could trigger problems like e.g. xid
assigned, besides bgworker doing stuff. Or postgres components doing so
unintentionally.

Erroring out in situation where we *know* that there were concurrent changes
unacceptable during pg_upgrade is always the right call. Such errors can be
debugged and then addressed (removing the extension from s_p_l, fixing the
extension, etc).

In contrast to that, preventing upgrades from succeeding because an extension
has a dependency on bgworkers working, just because the bgworker *could* be
doing something bad is different. The bgworker might never write, have a check
for binary upgrade mode, etc. It may not be realistic to fix and extension to
work without the bgworkers.

Imagine something like an table access method that has IO workers or such.

IIUC if a table access method has IO workers that starts doing writes quickly
(or any similar extension that *is* required to be present during upgrade but
that should be partially disabled), the only way to do a pg_upgrade would be to
make sure that the extension explicitly checks for the binary-upgrade mode and
don't do any writes, or provide a GUC for the same, since it should still
preloaded? I'm fine with that, but that should probably be documented.

Andres, do you still have an objection with either preventing bgworker
registration/launch or WAL-write during the impacted pg_upgrade steps, or a
better alternative to fix the problem?

I still object to the approach of preventing bgworker registration. It doesn't
provide much protection and might cause hard to address problems for some
extensions.

I don't think I ever objected to preventing WAL-writes, I even proposed that
upthread? Unless you suggest doing it specifically in bgworkers, rather than
preventing similar problems outside bgworkers as well.

Sorry I missed that when re-reading the thread. And no I'm not suggesting
preventing WAL writes in bgworkers only.

Since there's a clear consensus on how to fix the problem, I'm switching the
patch as Waiting on Author.

#33

Denis Laxalde

denis.laxalde@dalibo.com

almost 4 years ago

In reply to: Julien Rouhaud (#6)

Re: [PATCH] Disable bgworkers during servers start in pg_upgrade

Hi,

Julien Rouhaud a écrit :

On Wed, 27 Jan 2021 11:25:11 +0100
Denis Laxalde <denis.laxalde@dalibo.com> wrote:

Andres Freund a écrit :

b) when in binary upgrade mode / -b, error out on all wal writes in
sessions that don't explicitly set a session-level GUC to allow
writes.

It should be enough to add an additional test in XLogInsertAllowed() with some new
variable that is set when starting in binary upgrade mode, and a new function
to disable it that will be emitted by pg_dump / pg_dumpall in binary upgrade
mode.

I tried that simple change first:

diff --git a/src/backend/access/transam/xlog.c 
b/src/backend/access/transam/xlog.c
index dfe2a0bcce..8feab0cb96 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -8498,6 +8498,9 @@ HotStandbyActiveInReplay(void)
  bool
  XLogInsertAllowed(void)
  {
+       if (IsBinaryUpgrade)
+               return false;
+
         /*
          * If value is "unconditionally true" or "unconditionally 
false", just
          * return it.  This provides the normal fast path once recovery 
is known

But then, pg_upgrade's tests (make -C src/bin/pg_upgrade/ check) fail at
vaccumdb but not during pg_dumpall:

$ cat src/bin/pg_upgrade/pg_upgrade_utility.log
-----------------------------------------------------------------
pg_upgrade run on Fri Jan 28 10:37:36 2022
-----------------------------------------------------------------

command:
"/home/denis/src/pgsql/build/tmp_install/home/denis/.local/pgsql/bin/pg_dumpall"
--host /home/denis/src/pgsql/build/src/bin/pg_upgrade --port 51696
--username denis --globals-only --quote-all-identifiers --binary-upgrade
-f pg_upgrade_dump_globals.sql >> "pg_upgrade_utility.log" 2>&1

command:
"/home/denis/src/pgsql/build/tmp_install/home/denis/.local/pgsql/bin/vacuumdb"
--host /home/denis/src/pgsql/build/src/bin/pg_upgrade --port 51696
--username denis --all --analyze >> "pg_upgrade_utility.log" 2>&1
vacuumdb: vacuuming database "postgres"
vacuumdb: error: processing of database "postgres" failed: PANIC:
cannot make new WAL entries during recovery

In contrast with pg_dump/pg_dumpall, vacuumdb has no --binary-upgrade
flag, so it does not seem possible to use a special GUC setting to allow
WAL writes in that vacuumdb session at the moment.
Should we add --binary-upgrade to vacuumdb as well? Or am I going in the
wrong direction?

Thanks,
Denis

#34

Julien Rouhaud

rjuju123@gmail.com

almost 4 years ago

In reply to: Denis Laxalde (#33)

Re: [PATCH] Disable bgworkers during servers start in pg_upgrade

Hi,

On Fri, Jan 28, 2022 at 11:02:46AM +0100, Denis Laxalde wrote:

I tried that simple change first:
diff --git a/src/backend/access/transam/xlog.c
b/src/backend/access/transam/xlog.c
index dfe2a0bcce..8feab0cb96 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -8498,6 +8498,9 @@ HotStandbyActiveInReplay(void)
bool
XLogInsertAllowed(void)
{
+       if (IsBinaryUpgrade)
+               return false;
+
But then, pg_upgrade's tests (make -C src/bin/pg_upgrade/ check) fail at
vaccumdb but not during pg_dumpall:

[...]

command: "/home/denis/src/pgsql/build/tmp_install/home/denis/.local/pgsql/bin/vacuumdb"
--host /home/denis/src/pgsql/build/src/bin/pg_upgrade --port 51696
--username denis --all --analyze >> "pg_upgrade_utility.log" 2>&1
vacuumdb: vacuuming database "postgres"
vacuumdb: error: processing of database "postgres" failed: PANIC: cannot
make new WAL entries during recovery

In contrast with pg_dump/pg_dumpall, vacuumdb has no --binary-upgrade flag,
so it does not seem possible to use a special GUC setting to allow WAL
writes in that vacuumdb session at the moment.
Should we add --binary-upgrade to vacuumdb as well? Or am I going in the
wrong direction?

I think having a new option for vacuumdb is the right move.

It seems unlikely that any cron or similar on the host will try to run some
concurrent vacuumdb, but we still have to enforce that only the one executed by
pg_upgrade can succeed.

I guess it could be an undocumented option, similar to postgres' -b, which
would only be allowed iff --all and --freeze is also passed to be extra safe.

While at it:

vacuumdb: error: processing of database "postgres" failed: PANIC: cannot
make new WAL entries during recovery

Should we tweak that message when IsBinaryUpgrade is true?

#35

Denis Laxalde

denis.laxalde@dalibo.com

almost 4 years ago

In reply to: Julien Rouhaud (#34)

Re: [PATCH] Disable bgworkers during servers start in pg_upgrade

Julien Rouhaud a écrit :

I think having a new option for vacuumdb is the right move.

It seems unlikely that any cron or similar on the host will try to run some
concurrent vacuumdb, but we still have to enforce that only the one executed by
pg_upgrade can succeed.

I guess it could be an undocumented option, similar to postgres' -b, which
would only be allowed iff --all and --freeze is also passed to be extra safe.

The help text in pg_dump's man page states:

--binary-upgrade
This option is for use by in-place upgrade
utilities. Its use for other purposes is not
recommended or supported. The behavior of
the option may change in future releases
without notice.

Is it enough? Or do we actually want to hide it for vacuumdb?

While at it:

vacuumdb: error: processing of database "postgres" failed: PANIC: cannot
make new WAL entries during recovery

Should we tweak that message when IsBinaryUpgrade is true?

Yes, indeed, I had in mind to simply make the message more generic as:
"cannot insert new WAL entries".

#36

Julien Rouhaud

rjuju123@gmail.com

almost 4 years ago

In reply to: Denis Laxalde (#35)

Re: [PATCH] Disable bgworkers during servers start in pg_upgrade*

On Fri, Jan 28, 2022 at 03:06:57PM +0100, Denis Laxalde wrote:

Julien Rouhaud a ï¿½critï¿½:

I think having a new option for vacuumdb is the right move.

It seems unlikely that any cron or similar on the host will try to run some
concurrent vacuumdb, but we still have to enforce that only the one executed by
pg_upgrade can succeed.

I guess it could be an undocumented option, similar to postgres' -b, which
would only be allowed iff --all and --freeze is also passed to be extra safe.

The help text in pg_dump's man page states:

--binary-upgrade
This option is for use by in-place upgrade
utilities. Its use for other purposes is not
recommended or supported. The behavior of
the option may change in future releases
without notice.

Is it enough? Or do we actually want to hide it for vacuumdb?

I think it should be hidden, with a comment about it like postmaster.c getopt
call:

case 'b':
/* Undocumented flag used for binary upgrades */

vacuumdb: error: processing of database "postgres" failed: PANIC: cannot
make new WAL entries during recovery

Should we tweak that message when IsBinaryUpgrade is true?

Yes, indeed, I had in mind to simply make the message more generic as:
"cannot insert new WAL entries".

-1, it's good to have a clear reason why the error happened, especially since
it's supposed to be "should not happen" situation.

#37

Andres Freund

andres@anarazel.de

almost 4 years ago

In reply to: Julien Rouhaud (#34)

Re: [PATCH] Disable bgworkers during servers start in pg_upgrade

Hi,

On 2022-01-28 21:56:36 +0800, Julien Rouhaud wrote:

I think having a new option for vacuumdb is the right move.

Can't we pass the option via the connection string, e.g. something
PGOPTIONS='-c binary_upgrade_mode=true'? That seems to scale better than to
add it gradually to multiple tools.

Greetings,

Andres Freund

#38

Julien Rouhaud

rjuju123@gmail.com

almost 4 years ago

In reply to: Andres Freund (#37)

Re: [PATCH] Disable bgworkers during servers start in pg_upgrade

Hi,

On Fri, Jan 28, 2022 at 10:20:07AM -0800, Andres Freund wrote:

On 2022-01-28 21:56:36 +0800, Julien Rouhaud wrote:

I think having a new option for vacuumdb is the right move.

Can't we pass the option via the connection string, e.g. something
PGOPTIONS='-c binary_upgrade_mode=true'? That seems to scale better than to
add it gradually to multiple tools.

Ah right that's a better idea.

#39

Robert Haas

robertmhaas@gmail.com

almost 4 years ago

In reply to: Julien Rouhaud (#38)

Re: [PATCH] Disable bgworkers during servers start in pg_upgrade

On Fri, Jan 28, 2022 at 9:51 PM Julien Rouhaud <rjuju123@gmail.com> wrote:

On Fri, Jan 28, 2022 at 10:20:07AM -0800, Andres Freund wrote:

On 2022-01-28 21:56:36 +0800, Julien Rouhaud wrote:

I think having a new option for vacuumdb is the right move.

Can't we pass the option via the connection string, e.g. something
PGOPTIONS='-c binary_upgrade_mode=true'? That seems to scale better than to
add it gradually to multiple tools.

Ah right that's a better idea.

OK, so I think the conclusion here is that no patch which does
$SUBJECT is going to get committed, but somebody might write (or
finish?) a patch that does something else which could possibly get
committed once it's written. If and when that happens, I think that
patch should be submitted on a new thread with a subject line that
matches what the patch actually does. In the meantime, I'm going to
mark the CF entry for *this* thread as Returned with Feedback.

For what it's worth, I'm not 100% sure that $SUBJECT is a bad idea --
nor am I 100% sure that it's a good idea. On the other hand, I
definitely think the alternative proposal of blocking WAL writes at
times when they shouldn't be happening is a good idea, and most likely
extensions should also be coded in a way where they're smart enough
not to try except at times when it is safe. Therefore, it make sense
to me to proceed along those kinds of lines for now, and if that's not
enough and we need to revisit this idea at some point in the future,
we can.

Note that I'm taking no view for the present on whether any change
that might end up being agreed here should go into v15 or not. It's in
that fuzzy grey area where you could call it a feature, or a bug fix,
or technically-a-feature-but-let's-slip-it-in-after-freeze-anyway. We
can decide that when a completed patch shows up, though it's fair to
point out that the longer that takes, the less likely it is to be v15
material. I am, however, taking the position that holding this
CommitFest entry open is not in the best interest of the project. The
patch we'd theoretically be committing doesn't exist yet and doesn't
do what the subject line suggests.

Thanks,

--
Robert Haas
EDB: http://www.enterprisedb.com