pg_rewind problem: cannot find WAL
Hi all,
running 17.4 on ubuntu 24.04 machines. I've three hosts, pg-1
(primary) and two physical replicas.
I then promote host pg-3 as a master (pg_promote()) and want to rewind
the pg-1 to follow the new master, so:
ssh pg-3 'sudo -u postgres /usr/lib/postgresql/17/bin/pg_rewind -D
/var/lib/postgresql/17/main --source-server="user=replica_fluca
host=pg-3 dbname=replica_fluca"'
pg_rewind: servers diverged at WAL location 0/B8550F8 on timeline 1
pg_rewind: error: could not open file
"/var/lib/postgresql/17/main/pg_wal/00000001000000000000000A": No such
file or directory
pg_rewind: error: could not find previous WAL record at 0/AFFF4E8
But the file 0x010000A is not there:
% ssh pg-3 'sudo ls /var/lib/postgresql/17/main/pg_wal'
00000001000000000000000B.partial
00000002.history
00000002000000000000000B
00000002000000000000000C
00000002000000000000000D
00000002000000000000000E
archive_status
summaries
% ssh pg-1 'sudo ls /var/lib/postgresql/17/main/pg_wal'
000000010000000000000005.00000028.backup
00000001000000000000000B
00000001000000000000000C
00000001000000000000000D
00000001000000000000000E
archive_status
summaries
Do i have to ensure the old primary pg-1 does a wal switch before
promoting the other one and try to rewind?
Thanks,
Luca
On Wed, 2025-05-07 at 12:51 +0200, Luca Ferrari wrote:
running 17.4 on ubuntu 24.04 machines. I've three hosts, pg-1
(primary) and two physical replicas.
I then promote host pg-3 as a master (pg_promote()) and want to rewind
the pg-1 to follow the new master, so:ssh pg-3 'sudo -u postgres /usr/lib/postgresql/17/bin/pg_rewind -D
/var/lib/postgresql/17/main --source-server="user=replica_fluca
host=pg-3 dbname=replica_fluca"'
pg_rewind: servers diverged at WAL location 0/B8550F8 on timeline 1
pg_rewind: error: could not open file
"/var/lib/postgresql/17/main/pg_wal/00000001000000000000000A": No such
file or directory
pg_rewind: error: could not find previous WAL record at 0/AFFF4E8But the file 0x010000A is not there:
% ssh pg-3 'sudo ls /var/lib/postgresql/17/main/pg_wal'
00000001000000000000000B.partial
00000002.history
00000002000000000000000B
00000002000000000000000C
00000002000000000000000D
00000002000000000000000E
archive_status
summaries% ssh pg-1 'sudo ls /var/lib/postgresql/17/main/pg_wal'
000000010000000000000005.00000028.backup
00000001000000000000000B
00000001000000000000000C
00000001000000000000000D
00000001000000000000000E
archive_status
summariesDo i have to ensure the old primary pg-1 does a wal switch before
promoting the other one and try to rewind?
I don't think it is connected to a WAL switch.
I'd say that you should set "wal_keep_size" high enough that all the WAL
needed for pg_rewind is still present.
If you have a WAL archive, you could define a restore_command on the server
you want to rewind.
Yours,
Laurenz Albe
On Wed, May 7, 2025 at 3:55 PM Laurenz Albe <laurenz.albe@cybertec.at> wrote:
I don't think it is connected to a WAL switch.
Thanks.
I'd say that you should set "wal_keep_size" high enough that all the WAL
needed for pg_rewind is still present.If you have a WAL archive, you could define a restore_command on the server
you want to rewind.
I've pgbackrest making backups, so I have an archive_command. I'm
going to see if putting a restore_command can fix the problem.
Thanks for the suggestion.
Luca
On Thu, May 8, 2025 at 8:54 AM Luca Ferrari <fluca1978@gmail.com> wrote:
I've pgbackrest making backups, so I have an archive_command. I'm
going to see if putting a restore_command can fix the problem.
But I'm facing a quite trivial problem: in ubuntu installation the
configuration files are separated from the PGDATA.
Apparently pg_rewind is trying to read postgresql.conf to get the
restore_command, and I don't know how to specify the different
location of the postgresql.conf (cannot specifcy -c as in postgres):
$ /usr/lib/postgresql/17/bin/pg_rewind -D /var/lib/postgresql/17/main
--source-server="user=replica_fluca host=dev-psqlha3
dbname=replica_fluca" -R -P --debug -c
postgres: could not access the server configuration file
"/var/lib/postgresql/17/main/postgresql.conf": No such file or
directory
no data was returned by command "/usr/lib/postgresql/17/bin/postgres
-D /var/lib/postgresql/17/main -C restore_command"
child process exited with exit code 2
pg_rewind: error: could not read restore_command from target cluster
Any idea?
Clearly, postgresql.auto.conf is within PGDATA, and since my
recovery_command is there, one trick could be to touch and empty
PGDATA/postgresql.conf, pg_rewind, remove the fake configurtion file.
But I'm sure there is a smarter solution.
Thanks,
Luca
Any idea?
Clearly, postgresql.auto.conf is within PGDATA, and since my
recovery_command is there, one trick could be to touch and empty
PGDATA/postgresql.conf, pg_rewind, remove the fake configurtion file.
But I'm sure there is a smarter solution.Thanks,
Luca
A symlink from $PGDATA to where actual file?
On Thu, May 8, 2025 at 4:04 PM Rob Sargent <robjsargent@gmail.com> wrote:
A symlink from $PGDATA to where actual file?
Could be, I need to experiment with pg_basebackup to ensure it is not
conflicting with the /etc/ configuration file when creating a clone.
Luca
On 5/8/25 04:26, Luca Ferrari wrote:
On Thu, May 8, 2025 at 8:54 AM Luca Ferrari <fluca1978@gmail.com> wrote:
I've pgbackrest making backups, so I have an archive_command. I'm
going to see if putting a restore_command can fix the problem.But I'm facing a quite trivial problem: in ubuntu installation the
configuration files are separated from the PGDATA.
Apparently pg_rewind is trying to read postgresql.conf to get the
restore_command, and I don't know how to specify the different
location of the postgresql.conf (cannot specifcy -c as in postgres):$ /usr/lib/postgresql/17/bin/pg_rewind -D /var/lib/postgresql/17/main
--source-server="user=replica_fluca host=dev-psqlha3
dbname=replica_fluca" -R -P --debug -c
postgres: could not access the server configuration file
"/var/lib/postgresql/17/main/postgresql.conf": No such file or
directory
no data was returned by command "/usr/lib/postgresql/17/bin/postgres
-D /var/lib/postgresql/17/main -C restore_command"
child process exited with exit code 2
pg_rewind: error: could not read restore_command from target clusterAny idea?
/usr/lib/postgresql/17/bin/pg_rewind --help
pg_rewind resynchronizes a PostgreSQL cluster with another copy of the
cluster.
Usage:
pg_rewind [OPTION]...
Options:
-c, --restore-target-wal use "restore_command" in target
configuration to
retrieve WAL files from archives
-D, --target-pgdata=DIRECTORY existing data directory to modify
--source-pgdata=DIRECTORY source data directory to synchronize with
--source-server=CONNSTR source server to synchronize with
-n, --dry-run stop before modifying anything
-N, --no-sync do not wait for changes to be written
safely to disk
-P, --progress write progress messages
-R, --write-recovery-conf write configuration for replication
(requires --source-server)
--config-file=FILENAME use specified main server configuration
file when running target cluster
--debug write a lot of debug messages
--no-ensure-shutdown do not automatically fix unclean shutdown
--sync-method=METHOD set method for syncing files to disk
-V, --version output version information, then exit
-?, --help show this help, then exit
So use --config-file=FILENAME?
Clearly, postgresql.auto.conf is within PGDATA, and since my
recovery_command is there, one trick could be to touch and empty
PGDATA/postgresql.conf, pg_rewind, remove the fake configurtion file.
But I'm sure there is a smarter solution.Thanks,
Luca
--
Adrian Klaver
adrian.klaver@aklaver.com
On Thu, May 8, 2025 at 5:11 PM Adrian Klaver <adrian.klaver@aklaver.com> wrote:
/usr/lib/postgresql/17/bin/pg_rewind --help
pg_rewind resynchronizes a PostgreSQL cluster with another copy of the
cluster.
--config-file=FILENAME use specified main server configuration
shame on me! I was grepping config_file as in pg_ctl...
Thanks!
Luca