BUG #18979: pg_upgrade to PG17 fails if max_slot_wal_keep_size is not set to -1

Started by PG Bug reporting form9 months ago6 messagesbugs
Jump to latest
#1PG Bug reporting form
noreply@postgresql.org

The following bug has been logged on the website:

Bug reference: 18979
Logged by: Jorge Solorzano
Email address: jorsol@gmail.com
PostgreSQL version: 17.5
Operating system: Linux
Description:

Steps to Reproduce:
1. On a PostgreSQL 16 cluster, set max_slot_wal_keep_size = 500 (or any
non-default value).
2. Initdb a new PostgreSQL 17 cluster.
3. Copy the postgresql.conf from 16 to 17.
4. Attempt to perform a binary upgrade to PostgreSQL 17 using pg_upgrade
--check.
Expected Behavior:
pg_upgrade should automatically override max_slot_wal_keep_size to -1 as
required for upgrade mode.
Actual Behavior:
The upgrade fails with the following error:
command: "/usr/pgsql-17/bin/pg_ctl" -w -l
"/var/lib/pgsql/17/data/pg_upgrade_output.d/20250706T152559.441/log/pg_upgrade_server.log"
-D "/var/lib/pgsql/17/data" -o "-p 50432 -b -c synchronous_commit=off -c
fsync=off -c full_page_writes=off -c max_slot_wal_keep_size=-1 -c
listen_addresses='' -c unix_socket_permissions=0700 -c
unix_socket_directories='/var/lib/pgsql'" start >>
"/var/lib/pgsql/17/data/pg_upgrade_output.d/20250706T152559.441/log/pg_upgrade_server.log"
2>&1
waiting for server to start....2025-07-06 13:25:59.929 GMT [9439] LOG:
invalid value for parameter "max_slot_wal_keep_size": 500
2025-07-06 13:25:59.929 GMT [9439] DETAIL: "max_slot_wal_keep_size" must be
set to -1 during binary upgrade mode.
2025-07-06 15:25:59.930 CEST [9439] FATAL: configuration file
"/var/lib/pgsql/17/data/postgresql.conf" contains errors
stopped waiting
pg_ctl: could not start server
Additional Context:
While pg_upgrade does pass other required parameters like -c
max_slot_wal_keep_size=-1 on the command line when starting the new cluster
in upgrade mode, the value in postgresql.conf appears to override this,
leading to startup failure.
It would be helpful if pg_upgrade temporarily overrides the setting
correctly, regardless of the static config.

#2David G. Johnston
david.g.johnston@gmail.com
In reply to: PG Bug reporting form (#1)
Re: BUG #18979: pg_upgrade to PG17 fails if max_slot_wal_keep_size is not set to -1

On Sunday, July 6, 2025, PG Bug reporting form <noreply@postgresql.org>
wrote:

The following bug has been logged on the website:

Bug reference: 18979
Logged by: Jorge Solorzano
Email address: jorsol@gmail.com
PostgreSQL version: 17.5
Operating system: Linux
Description:

The upgrade fails with the following error:
command: "/usr/pgsql-17/bin/pg_ctl" -w -l
"/var/lib/pgsql/17/data/pg_upgrade_output.d/20250706T152559.441/log/pg_
upgrade_server.log"
-D "/var/lib/pgsql/17/data" -o "-p 50432 -b -c synchronous_commit=off -c
fsync=off -c full_page_writes=off -c max_slot_wal_keep_size=-1 -c
listen_addresses='' -c unix_socket_permissions=0700 -c
unix_socket_directories='/var/lib/pgsql'" start >>
"/var/lib/pgsql/17/data/pg_upgrade_output.d/20250706T152559.441/log/pg_
upgrade_server.log"
2>&1
waiting for server to start....2025-07-06 13:25:59.929 GMT [9439] LOG:
invalid value for parameter "max_slot_wal_keep_size": 500
2025-07-06 13:25:59.929 GMT [9439] DETAIL: "max_slot_wal_keep_size" must
be
set to -1 during binary upgrade mode.
2025-07-06 15:25:59.930 CEST [9439] FATAL: configuration file
"/var/lib/pgsql/17/data/postgresql.conf" contains errors
stopped waiting
pg_ctl: could not start server
Additional Context:
While pg_upgrade does pass other required parameters like -c
max_slot_wal_keep_size=-1 on the command line when starting the new cluster
in upgrade mode, the value in postgresql.conf appears to override this,
leading to startup failure.

It doesn’t override it but both the value in the conf and the one on the
startup command are evaluated and the one in the conf causes the failure
before the startup command version can restore the system to a valid
state. I.e., validation is not deferred.

It would be helpful if pg_upgrade temporarily overrides the setting

correctly, regardless of the static config.

The existing behavior by pg_upgrade makes sense but it pointless given the
existing implementation of setting handling. It seems doable to remove the
check from the setting area and place it elsewhere.

David J.

#3Tom Lane
tgl@sss.pgh.pa.us
In reply to: PG Bug reporting form (#1)
Re: BUG #18979: pg_upgrade to PG17 fails if max_slot_wal_keep_size is not set to -1

PG Bug reporting form <noreply@postgresql.org> writes:

Steps to Reproduce:
1. On a PostgreSQL 16 cluster, set max_slot_wal_keep_size = 500 (or any
non-default value).
2. Initdb a new PostgreSQL 17 cluster.
3. Copy the postgresql.conf from 16 to 17.
4. Attempt to perform a binary upgrade to PostgreSQL 17 using pg_upgrade
--check.
Expected Behavior:
pg_upgrade should automatically override max_slot_wal_keep_size to -1 as
required for upgrade mode.

I do not think it is within pg_upgrade's charter to modify your
postgresql.conf file.

However, maybe instead of having check_max_slot_wal_keep_size
throw an error about this, we could make it just silently keep
the value as -1. There's a nearby thread about silently ignoring
inappropriate GUC settings during initdb [1]/messages/by-id/87plejmnpy.fsf@163.com, and this seems like
it'd be in the same spirit. Or we could just drop the server-side
check altogether, figuring that it's pg_upgrade's job to see to
that.

BTW, your step 3 above is not very good practice. It will lose the
entries for any new GUCs added in the new PG version. While that's
not a functional problem, it does mean that the .conf file's
usefulness as documentation gets worse and worse over time.

regards, tom lane

[1]: /messages/by-id/87plejmnpy.fsf@163.com

#4David G. Johnston
david.g.johnston@gmail.com
In reply to: Tom Lane (#3)
Re: BUG #18979: pg_upgrade to PG17 fails if max_slot_wal_keep_size is not set to -1

On Sun, Jul 6, 2025 at 8:26 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:

PG Bug reporting form <noreply@postgresql.org> writes:

Expected Behavior:
pg_upgrade should automatically override max_slot_wal_keep_size to -1 as
required for upgrade mode.

I do not think it is within pg_upgrade's charter to modify your
postgresql.conf file.

That isn't what is being asked. They simply want the override that is
already in place to actually work.

However, maybe instead of having check_max_slot_wal_keep_size
throw an error about this, we could make it just silently keep
the value as -1. There's a nearby thread about silently ignoring
inappropriate GUC settings during initdb [1], and this seems like
it'd be in the same spirit. Or we could just drop the server-side
check altogether, figuring that it's pg_upgrade's job to see to
that.

Can't we just move this to postmaster.c ~ line 850 ?

This seems no different than wal_level and summarize_wal having a
co-dependency such that intermediate invalid states must be allowed to
exist so long as what the server ends up running under is valid.
max_slot_wal_keep_size is sighup just like summarize_wal
(and IsBinaryUpgrade behaves like a postmaster GUC)

David J.

#5David G. Johnston
david.g.johnston@gmail.com
In reply to: David G. Johnston (#4)
Re: BUG #18979: pg_upgrade to PG17 fails if max_slot_wal_keep_size is not set to -1

On Sun, Jul 6, 2025 at 9:41 AM David G. Johnston <david.g.johnston@gmail.com>
wrote:

Can't we just move this to postmaster.c ~ line 850 ?

This seems no different than wal_level and summarize_wal having a
co-dependency such that intermediate invalid states must be allowed to
exist so long as what the server ends up running under is valid.
max_slot_wal_keep_size is sighup just like summarize_wal
(and IsBinaryUpgrade behaves like a postmaster GUC)

I suppose the answer is because sighup settings seemingly do not belong
here...

./psql postgres
psql (18beta1)
Type "help" for help.

postgres=# show summarize_wal;
summarize_wal
---------------
on
(1 row)

postgres=# show wal_level;
wal_level
-----------
minimal
(1 row)

Which is an impossible combination to begin in but is allowed if you change
only sumamrize_wal to on and perform a reload.

David J.

#6Tom Lane
tgl@sss.pgh.pa.us
In reply to: David G. Johnston (#4)
Re: BUG #18979: pg_upgrade to PG17 fails if max_slot_wal_keep_size is not set to -1

"David G. Johnston" <david.g.johnston@gmail.com> writes:

On Sun, Jul 6, 2025 at 8:26 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:

However, maybe instead of having check_max_slot_wal_keep_size
throw an error about this, we could make it just silently keep
the value as -1.

Can't we just move this to postmaster.c ~ line 850 ?

max_slot_wal_keep_size is marked PGC_SIGHUP, so in principle it
could be changed after postmaster start. So if we want a server-side
defense, I don't believe checking at postmaster start is adequate.

In practice, as long as pg_upgrade provides that -c switch, I don't
believe any other GUC source that is allowed to set a PGC_SIGHUP
GUC would override the -c switch. So the need for any server-side
defense isn't obvious to me.

This seems no different than wal_level and summarize_wal having a
co-dependency such that intermediate invalid states must be allowed to
exist so long as what the server ends up running under is valid.

I think that code doesn't do what its author hoped :-(

Anyway, I found the thread for commit 8bfb231b4 which installed
this code [1]/messages/by-id/20231027.115759.2206827438943188717.horikyota.ntt@gmail.com, and I'm going to go complain there.

regards, tom lane

[1]: /messages/by-id/20231027.115759.2206827438943188717.horikyota.ntt@gmail.com