Promoted slave tries to archive previously archived WAL file
I have master and slave running with the following contents of their pg_wal
directories and archivedir:
ls -l /mnt/pgsql/archive/
-rw-rw-rw-. 1 root root 16777216 Feb 15 09:39 000000010000000000000001
-rw-rw-rw-. 1 root root 16777216 Feb 15 09:39 000000010000000000000002
-rw-rw-rw-. 1 root root 302 Feb 15 09:39
000000010000000000000002.00000028.backup
pg-hdp-node1.kitchen.local
/var/lib/pgsql/10/data/pg_wal/:
-rw-------. 1 postgres postgres 16777216 Feb 15 09:39
000000010000000000000002
-rw-------. 1 postgres postgres 302 Feb 15 09:39
000000010000000000000002.00000028.backup
-rw-------. 1 postgres postgres 16777216 Feb 15 09:44
000000010000000000000003
-rw-------. 1 postgres postgres 16777216 Feb 15 09:39
000000010000000000000004
drwx------. 2 postgres postgres 96 Feb 15 09:44 archive_status
/var/lib/pgsql/10/data/pg_wal/archive_status:
-rw-------. 1 postgres postgres 0 Feb 15 09:39
000000010000000000000002.00000028.backup.done
-rw-------. 1 postgres postgres 0 Feb 15 09:39 000000010000000000000002.done
pg-hdp-node2.kitchen.local
/var/lib/pgsql/10/data/pg_wal/:
-rw-------. 1 postgres root 16777216 Feb 15 09:39
000000010000000000000002
-rw-------. 1 postgres postgres 16777216 Feb 15 09:44
000000010000000000000003
drwx------. 2 postgres root 6 Feb 15 09:39 archive_status
/var/lib/pgsql/10/data/pg_wal/archive_status:
diff from secondary pg-hdp-node2.kitchen.local on
/var/lib/pgsql/10/data/pg_wal/000000010000000000000002 and
/mnt/pgsql/archive/000000010000000000000002 shows binary differences but as
expected no differences for diff on primary pg-hdp-node1.kitchen.local
Failover is performed and pg-hdp-node2.kitchen.local tries and fails to
archive WAL segment 000000010000000000000002 because it has been previously
archived
2019-02-15 09:54:50.518 PST [780] DETAIL: The failed archive command was:
test ! -f /mnt/pgsql/archive/000000010000000000000002 && cp
pg_wal/000000010000000000000002 /mnt/pgsql/archive/000000010000000000000002
Based on this thread
/messages/by-id/11b405a6-2176-9510-bf5b-ea9c0e860635@pgmasters.net
it is suggested to handle this case by reporting success but in my case
contents are different. I would think that previously archived
000000010000000000000002 is the right WAL segment.
So my questions are as follows:
Why WAL segments differ?
How should this be resolved on the new primary?
--
*Andre Piwoni*
Ok. I think I uncovered a bug.
My slave nodes were created using pg_basebackup with --wal-method=stream.
If I understand right this option streams WAL files generated during backup
and this WAL file was 000000010000000000000002 but its contents were
different from what was on the primary and in WAL archive. When I changed
--wal-method=fetch which collects WAL files generated during the backup at
the end then diff did not detect any changes. When I failover I don't seem
to have the issue with new primary to archive.
It seems streaming WAL segments created during backup produced corrupt file
based on diff.
On Fri, Feb 15, 2019 at 10:23 AM Andre Piwoni <apiwoni@webmd.net> wrote:
I have master and slave running with the following contents of their
pg_wal directories and archivedir:ls -l /mnt/pgsql/archive/
-rw-rw-rw-. 1 root root 16777216 Feb 15 09:39 000000010000000000000001
-rw-rw-rw-. 1 root root 16777216 Feb 15 09:39 000000010000000000000002
-rw-rw-rw-. 1 root root 302 Feb 15 09:39
000000010000000000000002.00000028.backuppg-hdp-node1.kitchen.local
/var/lib/pgsql/10/data/pg_wal/:
-rw-------. 1 postgres postgres 16777216 Feb 15 09:39
000000010000000000000002
-rw-------. 1 postgres postgres 302 Feb 15 09:39
000000010000000000000002.00000028.backup
-rw-------. 1 postgres postgres 16777216 Feb 15 09:44
000000010000000000000003
-rw-------. 1 postgres postgres 16777216 Feb 15 09:39
000000010000000000000004
drwx------. 2 postgres postgres 96 Feb 15 09:44 archive_status
/var/lib/pgsql/10/data/pg_wal/archive_status:
-rw-------. 1 postgres postgres 0 Feb 15 09:39
000000010000000000000002.00000028.backup.done
-rw-------. 1 postgres postgres 0 Feb 15 09:39
000000010000000000000002.donepg-hdp-node2.kitchen.local
/var/lib/pgsql/10/data/pg_wal/:
-rw-------. 1 postgres root 16777216 Feb 15 09:39
000000010000000000000002
-rw-------. 1 postgres postgres 16777216 Feb 15 09:44
000000010000000000000003
drwx------. 2 postgres root 6 Feb 15 09:39 archive_status
/var/lib/pgsql/10/data/pg_wal/archive_status:diff from secondary pg-hdp-node2.kitchen.local on /var/lib/pgsql/10/data/pg_wal/000000010000000000000002 and /mnt/pgsql/archive/000000010000000000000002 shows binary differences but as expected no differences for diff on primary pg-hdp-node1.kitchen.localFailover is performed and pg-hdp-node2.kitchen.local tries and fails to
archive WAL segment 000000010000000000000002 because it has been previously
archived
2019-02-15 09:54:50.518 PST [780] DETAIL: The failed archive command was:
test ! -f /mnt/pgsql/archive/000000010000000000000002 && cp
pg_wal/000000010000000000000002 /mnt/pgsql/archive/000000010000000000000002Based on this thread
/messages/by-id/11b405a6-2176-9510-bf5b-ea9c0e860635@pgmasters.net
it is suggested to handle this case by reporting success but in my case
contents are different. I would think that previously archived
000000010000000000000002 is the right WAL segment.So my questions are as follows:
Why WAL segments differ?
How should this be resolved on the new primary?
--*Andre Piwoni*
--
*Andre Piwoni*
Sr. Software Developer, BI/Database
*Web*MD Health Services
Mobile: 801.541.4722
www.webmdhealthservices.com
On Fri, Feb 15, 2019 at 12:03:39PM -0800, Andre Piwoni wrote:
My slave nodes were created using pg_basebackup with --wal-method=stream.
If I understand right this option streams WAL files generated during backup
and this WAL file was 000000010000000000000002 but its contents were
different from what was on the primary and in WAL archive. When I changed
--wal-method=fetch which collects WAL files generated during the backup at
the end then diff did not detect any changes. When I failover I don't seem
to have the issue with new primary to archive.It seems streaming WAL segments created during backup produced corrupt file
based on diff.
Nope, --wal-method=stream and fetch behave as they are designed for.
"fetch" would include all the contents of pg_wal/ as part of a
BASE_BACKUP command using the replication protocol while "stream"
would make the base backup avoid copying all WAL files, copying them
in parallel with a secondary process which uses the same facility as
pg_receiverwal, in which case the set of files in
pg_wal/archive_status/ has little meaning.
I am also curious about the way you use to do failovers. Let me
guess: you stop the standby, delete its recovery.conf and then restart
the former standby? This would prevent a timeline jump at promotion
which would explain the conflicts you are seeing when archiving two
times the same segment.
--
Michael
I call pg_ctl -D /var/lib/pgsql/10/data promote to upgrade slave to master
when failover happens
I repoint slave to the master by stopping it, updating recovery.conf and
restarting it. Let me know if I'm doing it wrong.
Thank you for clarifying that contents of WAL files from stream and fetch
method should be different. The fact remains that WAL file created during
the backup by stream method does not have extension .done and new master
repeatedly fails to archive it because previous master archived it. This
does not happen in fetch mode where file created during backup has .done
extension.
On Sun, Feb 17, 2019 at 5:27 PM Michael Paquier <michael@paquier.xyz> wrote:
On Fri, Feb 15, 2019 at 12:03:39PM -0800, Andre Piwoni wrote:
My slave nodes were created using pg_basebackup with --wal-method=stream.
If I understand right this option streams WAL files generated duringbackup
and this WAL file was 000000010000000000000002 but its contents were
different from what was on the primary and in WAL archive. When I changed
--wal-method=fetch which collects WAL files generated during the backupat
the end then diff did not detect any changes. When I failover I don't
seem
to have the issue with new primary to archive.
It seems streaming WAL segments created during backup produced corrupt
file
based on diff.
Nope, --wal-method=stream and fetch behave as they are designed for.
"fetch" would include all the contents of pg_wal/ as part of a
BASE_BACKUP command using the replication protocol while "stream"
would make the base backup avoid copying all WAL files, copying them
in parallel with a secondary process which uses the same facility as
pg_receiverwal, in which case the set of files in
pg_wal/archive_status/ has little meaning.I am also curious about the way you use to do failovers. Let me
guess: you stop the standby, delete its recovery.conf and then restart
the former standby? This would prevent a timeline jump at promotion
which would explain the conflicts you are seeing when archiving two
times the same segment.
--
Michael
--
*Andre Piwoni*
Sr. Software Developer, BI/Database
*Web*MD Health Services
Mobile: 801.541.4722
www.webmdhealthservices.com