pg ignores wal files in pg_wal, and instead tries to load them from archive/primary

Started by hubert depesz lubaczewskiover 3 years ago4 messagesbugs
Jump to latest

Hi,
we have following situation:

1. primary on 14.5 that is *not* archiving (this is temporary situation
related to ongoing upgrade from pg 12 proces) - all on ubuntu focal.
2. on new replica we run (via wrapper, but this doesn't seem to be
related):
pg_basebackup -D /var/lib/postgresql/14/main -c fast -v -P -U some-user -h sourcedb.hostname
3. after it is done, if the datadir was large enough, pg on replica
doesn't replicate/catchup, because, from logs:
2022-09-29 14:59:26.587 UTC,,,2355588,,6335b2ce.23f184,1,,2022-09-29 14:59:26 UTC,,0,LOG,00000,"started streaming WAL from primary at 7E8/67000000 on timeline 1",,,,,,,,,"","walreceiver",,0
2022-09-29 14:59:26.587 UTC,,,2355588,,6335b2ce.23f184,2,,2022-09-29 14:59:26 UTC,,0,FATAL,08P01,"could not receive data from WAL stream: ERROR: requested WAL segment 00000001000007E800000067 has already been removed",,,,,,,,,"","walreceiver",,0
4. if there is restore_command configured, it tries to read data from archive
too, but archive is non-existant.
5. the "missing" file is there, in pg_wal (I would assume that
pg_basebackup copied it there):
root@host# /bin/ls -c1 0* | wc -l
1068
root@host# /bin/ls -c1 0* | sort -V | head -n 1
00000001000007E4000000A0
root@host# /bin/ls -c1 0* | sort -V | tail -n 1
00000001000007E800000092
root@host# /bin/ls -c1 0* | sort -V | grep -n 00000001000007E800000067
1043:00000001000007E800000067
root@host# /bin/ls -c1 0* | sort -V | grep -n -C5 00000001000007E800000067
1038-00000001000007E800000062
1039-00000001000007E800000063
1040-00000001000007E800000064
1041-00000001000007E800000065
1042-00000001000007E800000066
1043:00000001000007E800000067
1044-00000001000007E800000068
1045-00000001000007E800000069
1046-00000001000007E800000070
1047-00000001000007E800000071
1048-00000001000007E800000072
6. What's more - I straced startup process, and it does:
a. opens the wal file (the problematic one)
b. read 8k form it
c. closes it
d. checks existence of finish.recovery trigger file (it doesn't exist)
e. starts restore program (which fails).
f. rinse and repeat

What am I missing? what is wrong? How can I fix it? The problem is not fixing
*this server*, because we are in process of upgrading LOTS and LOTS of servers,
and I need to know what is broken/how to work around it.

Currently our goto fix is:
1. increase wal_keep_size to ~ 200GB
2. standaup replica
3. once it catches up decrease wal_keep_size to standard (for us) 16GB

but it is not really nice solution.

Best regards,

depesz

#2Michael Paquier
michael@paquier.xyz
In reply to: hubert depesz lubaczewski (#1)
Re: pg ignores wal files in pg_wal, and instead tries to load them from archive/primary

On Thu, Sep 29, 2022 at 05:51:02PM +0200, hubert depesz lubaczewski wrote:

What am I missing? what is wrong? How can I fix it? The problem is not fixing
*this server*, because we are in process of upgrading LOTS and LOTS of servers,
and I need to know what is broken/how to work around it.

So, your backup_label file points to 00000001000007E4000000A0 as being
the first segment to begin recovery from. This means that the startup
process is able to correctly kick recovery using the contents of the
local pg_wal/ from 00000001000007E4000000A0 to 00000001000007E800000066.
Once 00000001000007E800000067 cannot be read, the startup process
fails to read this segment and decides to switch the source to get WAL
from as streaming. The order should be local pg_wal/, then archiving
and finally streaming if all are configured, as far as I recall.

What's the error you are seeing when reading 00000001000007E800000067?
Is pg_waldump able to understand its internals? This could point out
to an issue with xlogreader.c, for example.

Another thing that has changed in this area is related to continuation
records. What are the WAL contents at the end of
00000001000007E800000066? Do you spot an OVERWRITE_CONTRECORD when
using pg_waldump on it?
--
Michael

In reply to: Michael Paquier (#2)
Re: pg ignores wal files in pg_wal, and instead tries to load them from archive/primary

On Fri, Sep 30, 2022 at 09:49:28AM +0900, Michael Paquier wrote:

On Thu, Sep 29, 2022 at 05:51:02PM +0200, hubert depesz lubaczewski wrote:

What am I missing? what is wrong? How can I fix it? The problem is not fixing
*this server*, because we are in process of upgrading LOTS and LOTS of servers,
and I need to know what is broken/how to work around it.

So, your backup_label file points to 00000001000007E4000000A0 as being
the first segment to begin recovery from. This means that the startup

Well, *now* I don't have backup_label:

root@c273-pg2258:/var/lib/postgresql/14/main# ls -l
total 68520
-rw------- 1 postgres postgres 3 Sep 28 01:38 PG_VERSION
-rw------- 1 postgres postgres 61264030 Sep 30 04:42 autoprewarm.blocks
-rw------- 1 postgres postgres 231 Sep 28 01:38 backup_label.old
-rw------- 1 postgres postgres 8783060 Sep 28 01:38 backup_manifest
drwx------ 6 postgres postgres 74 Sep 28 01:38 base
-rw------- 1 postgres postgres 61 Sep 30 04:00 current_logfiles
drwx------ 2 postgres postgres 4096 Sep 29 14:25 global
drwx------ 2 postgres postgres 6 Sep 28 01:38 pg_commit_ts
drwx------ 2 postgres postgres 6 Sep 28 01:38 pg_dynshmem
drwx------ 4 postgres postgres 84 Sep 28 03:38 pg_logical
drwx------ 4 postgres postgres 48 Sep 28 01:38 pg_multixact
drwx------ 2 postgres postgres 6 Sep 28 01:38 pg_notify
drwx------ 2 postgres postgres 6 Sep 28 01:38 pg_replslot
drwx------ 2 postgres postgres 6 Sep 28 01:38 pg_serial
drwx------ 2 postgres postgres 6 Sep 28 01:38 pg_snapshots
drwx------ 2 postgres postgres 6 Sep 29 14:25 pg_stat
drwx------ 2 postgres postgres 43 Sep 29 14:25 pg_stat_tmp
drwx------ 2 postgres postgres 122 Sep 29 14:25 pg_subtrans
drwx------ 2 postgres postgres 27 Sep 28 01:38 pg_tblspc
drwx------ 2 postgres postgres 6 Sep 28 01:38 pg_twophase
drwx------ 3 postgres postgres 204800 Sep 28 03:38 pg_wal
drwx------ 2 postgres postgres 4096 Sep 28 01:38 pg_xact
-rw------- 1 postgres postgres 118 Sep 28 01:38 postgresql.auto.conf
-rw------- 1 postgres postgres 130 Sep 29 14:25 postmaster.opts
-rw------- 1 postgres postgres 103 Sep 29 14:25 postmaster.pid
-rw-r--r-- 1 postgres postgres 0 Sep 28 01:38 standby.signal

backup_label.old is:
root@c273-pg2258:/var/lib/postgresql/14/main# cat backup_label.old
START WAL LOCATION: 7D5/3B068338 (file 00000001000007D50000003B)
CHECKPOINT LOCATION: 7D5/3B076F78
BACKUP METHOD: streamed
BACKUP FROM: primary
START TIME: 2022-09-27 21:06:00 UTC
LABEL: pg_basebackup base backup
START TIMELINE: 1

process is able to correctly kick recovery using the contents of the
local pg_wal/ from 00000001000007E4000000A0 to 00000001000007E800000066.
Once 00000001000007E800000067 cannot be read, the startup process
fails to read this segment and decides to switch the source to get WAL
from as streaming. The order should be local pg_wal/, then archiving
and finally streaming if all are configured, as far as I recall.
What's the error you are seeing when reading 00000001000007E800000067?

I don't see any errors. Let me show you some data from log:
2022-09-30 04:44:43.833 UTC,"postgres","postgres",2716081,"[local]",6336743b.2971b1,3,"authentication",2022-09-30 04:44:43 UTC,4/14269,0,LOG,00000,"connection authorized: user=postgres database=postgres application_name=psql",,,,,,,,,"","client backend",,1147616880456321454
2022-09-30 04:44:43.917 UTC,,,2716082,,6336743b.2971b2,1,,2022-09-30 04:44:43 UTC,,0,LOG,00000,"started streaming WAL from primary at 7E8/67000000 on timeline 1",,,,,,,,,"","walreceiver",,0
2022-09-30 04:44:43.918 UTC,,,2716082,,6336743b.2971b2,2,,2022-09-30 04:44:43 UTC,,0,FATAL,08P01,"could not receive data from WAL stream: ERROR: requested WAL segment 00000001000007E800000067 has already been removed",,,,,,,,,"","walreceiver",,0
2022-09-30 04:44:48.922 UTC,,,2716187,,63367440.29721b,1,,2022-09-30 04:44:48 UTC,,0,LOG,00000,"started streaming WAL from primary at 7E8/67000000 on timeline 1",,,,,,,,,"","walreceiver",,0
2022-09-30 04:44:48.922 UTC,,,2716187,,63367440.29721b,2,,2022-09-30 04:44:48 UTC,,0,FATAL,08P01,"could not receive data from WAL stream: ERROR: requested WAL segment 00000001000007E800000067 has already been removed",,,,,,,,,"","walreceiver",,0
2022-09-30 04:44:51.867 UTC,,,2716207,"[local]",63367443.29722f,1,"",2022-09-30 04:44:51 UTC,,0,LOG,00000,"connection received: host=[local]",,,,,,,,,"","not initialized",,0
2022-09-30 04:44:51.868 UTC,"postgres","db",2716207,"[local]",63367443.29722f,2,"authentication",2022-09-30 04:44:51 UTC,4/14271,0,LOG,00000,"connection authenticated: identity=""postgres"" method=peer (/etc/postgresql/14/main/pg_hba.conf:11)",,,,,,,,,"","client backend",,-659920678270393094
2022-09-30 04:44:51.868 UTC,"postgres","db",2716207,"[local]",63367443.29722f,3,"authentication",2022-09-30 04:44:51 UTC,4/14271,0,LOG,00000,"connection authorized: user=postgres database=db application_name=psql",,,,,,,,,"","client backend",,-659920678270393094
2022-09-30 04:44:53.751 UTC,,,2716218,"127.0.0.1:39078",63367445.29723a,1,"",2022-09-30 04:44:53 UTC,,0,LOG,00000,"connection received: host=127.0.0.1 port=39078",,,,,,,,,"","not initialized",,0
2022-09-30 04:44:53.927 UTC,,,2716219,,63367445.29723b,1,,2022-09-30 04:44:53 UTC,,0,LOG,00000,"started streaming WAL from primary at 7E8/67000000 on timeline 1",,,,,,,,,"","walreceiver",,0
2022-09-30 04:44:53.927 UTC,,,2716219,,63367445.29723b,2,,2022-09-30 04:44:53 UTC,,0,FATAL,08P01,"could not receive data from WAL stream: ERROR: requested WAL segment 00000001000007E800000067 has already been removed",,,,,,,,,"","walreceiver",,0
2022-09-30 04:44:58.931 UTC,,,2716255,,6336744a.29725f,1,,2022-09-30 04:44:58 UTC,,0,LOG,00000,"started streaming WAL from primary at 7E8/67000000 on timeline 1",,,,,,,,,"","walreceiver",,0
2022-09-30 04:44:58.931 UTC,,,2716255,,6336744a.29725f,2,,2022-09-30 04:44:58 UTC,,0,FATAL,08P01,"could not receive data from WAL stream: ERROR: requested WAL segment 00000001000007E800000067 has already been removed",,,,,,,,,"","walreceiver",,0
2022-09-30 04:45:03.936 UTC,,,2716366,,6336744f.2972ce,1,,2022-09-30 04:45:03 UTC,,0,LOG,00000,"started streaming WAL from primary at 7E8/67000000 on timeline 1",,,,,,,,,"","walreceiver",,0
2022-09-30 04:45:03.936 UTC,,,2716366,,6336744f.2972ce,2,,2022-09-30 04:45:03 UTC,,0,FATAL,08P01,"could not receive data from WAL stream: ERROR: requested WAL segment 00000001000007E800000067 has already been removed",,,,,,,,,"","walreceiver",,0
2022-09-30 04:45:08.751 UTC,,,2716401,"127.0.0.1:45926",63367454.2972f1,1,"",2022-09-30 04:45:08 UTC,,0,LOG,00000,"connection received: host=127.0.0.1 port=45926",,,,,,,,,"","not initialized",,0
2022-09-30 04:45:08.940 UTC,,,2716402,,63367454.2972f2,1,,2022-09-30 04:45:08 UTC,,0,LOG,00000,"started streaming WAL from primary at 7E8/67000000 on timeline 1",,,,,,,,,"","walreceiver",,0
2022-09-30 04:45:08.940 UTC,,,2716402,,63367454.2972f2,2,,2022-09-30 04:45:08 UTC,,0,FATAL,08P01,"could not receive data from WAL stream: ERROR: requested WAL segment 00000001000007E800000067 has already been removed",,,,,,,,,"","walreceiver",,0
2022-09-30 04:45:12.960 UTC,,,2716425,"[local]",63367458.297309,1,"",2022-09-30 04:45:12 UTC,,0,LOG,00000,"connection received: host=[local]",,,,,,,,,"","not initialized",,0
2022-09-30 04:45:12.961 UTC,"postgres","postgres",2716425,"[local]",63367458.297309,2,"authentication",2022-09-30 04:45:12 UTC,4/14277,0,LOG,00000,"connection authenticated: identity=""postgres"" method=peer (/etc/postgresql/14/main/pg_hba.conf:11)",,,,,,,,,"","client backend",,1147616880456321454
2022-09-30 04:45:12.961 UTC,"postgres","postgres",2716425,"[local]",63367458.297309,3,"authentication",2022-09-30 04:45:12 UTC,4/14277,0,LOG,00000,"connection authorized: user=postgres database=postgres application_name=psql",,,,,,,,,"","client backend",,1147616880456321454
2022-09-30 04:45:13.946 UTC,,,2716431,,63367459.29730f,1,,2022-09-30 04:45:13 UTC,,0,LOG,00000,"started streaming WAL from primary at 7E8/67000000 on timeline 1",,,,,,,,,"","walreceiver",,0
2022-09-30 04:45:13.946 UTC,,,2716431,,63367459.29730f,2,,2022-09-30 04:45:13 UTC,,0,FATAL,08P01,"could not receive data from WAL stream: ERROR: requested WAL segment 00000001000007E800000067 has already been removed",,,,,,,,,"","walreceiver",,0

Is pg_waldump able to understand its internals? This could point out
to an issue with xlogreader.c, for example.

So... it can, but no records:

postgres@c273-pg2258:~/14/main$ /usr/lib/postgresql/14/bin/pg_waldump 00000001000007E800000067
pg_waldump: fatal: could not find a valid record after 7E8/67000000
postgres@c273-pg2258:~/14/main$ /usr/lib/postgresql/14/bin/pg_waldump 00000001000007E800000068
pg_waldump: fatal: could not find a valid record after 7E8/68000000

but previous one works:
postgres@c273-pg2258:~/14/main$ /usr/lib/postgresql/14/bin/pg_waldump 00000001000007E800000066 | tail -n 20
rmgr: Transaction len (rec/tot): 65/ 65, tx: 140005495, lsn: 7E8/66A0B010, prev 7E8/66A0AFC8, desc: COMMIT 2022-09-28 01:38:20.782352 UTC; origin: node 1, lsn CE64/A5086920, at 2022-09-28 01:38:20.780890 UTC
rmgr: Heap len (rec/tot): 54/ 54, tx: 140005496, lsn: 7E8/66A0B058, prev 7E8/66A0B010, desc: LOCK off 55: xid 140005496: flags 0x00 LOCK_ONLY EXCL_LOCK KEYS_UPDATED , blkref #0: rel 17963/16644/195315099 blk 83995
rmgr: Heap len (rec/tot): 165/ 165, tx: 140005496, lsn: 7E8/66A0B090, prev 7E8/66A0B058, desc: HOT_UPDATE off 55 xmax 140005496 flags 0x10 KEYS_UPDATED ; new off 74 xmax 140005496, blkref #0: rel 17963/16644/195315099 blk 83995
rmgr: Transaction len (rec/tot): 65/ 65, tx: 140005496, lsn: 7E8/66A0B138, prev 7E8/66A0B090, desc: COMMIT 2022-09-28 01:38:20.783036 UTC; origin: node 3, lsn CE64/A5087E00, at 2022-09-28 01:38:20.780992 UTC
rmgr: XLOG len (rec/tot): 51/ 980, tx: 0, lsn: 7E8/66A0B180, prev 7E8/66A0B138, desc: FPI_FOR_HINT , blkref #0: rel 17963/16644/195326137 blk 271085 FPW
rmgr: Heap len (rec/tot): 54/ 54, tx: 140005497, lsn: 7E8/66A0B558, prev 7E8/66A0B180, desc: LOCK off 13: xid 140005497: flags 0x00 LOCK_ONLY EXCL_LOCK KEYS_UPDATED , blkref #0: rel 17963/16644/195326137 blk 271085
rmgr: Heap len (rec/tot): 261/ 261, tx: 140005497, lsn: 7E8/66A0B590, prev 7E8/66A0B558, desc: HOT_UPDATE off 13 xmax 140005497 flags 0x10 KEYS_UPDATED ; new off 15 xmax 140005497, blkref #0: rel 17963/16644/195326137 blk 271085
rmgr: Transaction len (rec/tot): 65/ 65, tx: 140005497, lsn: 7E8/66A0B698, prev 7E8/66A0B590, desc: COMMIT 2022-09-28 01:38:20.784762 UTC; origin: node 1, lsn CE64/A508C7F0, at 2022-09-28 01:38:20.782858 UTC
rmgr: Heap len (rec/tot): 54/ 54, tx: 140005498, lsn: 7E8/66A0B6E0, prev 7E8/66A0B698, desc: LOCK off 8: xid 140005498: flags 0x00 LOCK_ONLY EXCL_LOCK KEYS_UPDATED , blkref #0: rel 17963/16644/195328502 blk 3038500
rmgr: Heap len (rec/tot): 54/ 54, tx: 140005498, lsn: 7E8/66A0B718, prev 7E8/66A0B6E0, desc: LOCK off 8: xid 140005498: flags 0x00 LOCK_ONLY EXCL_LOCK KEYS_UPDATED , blkref #0: rel 17963/16644/195328502 blk 3038500
rmgr: Heap len (rec/tot): 1821/ 1821, tx: 140005498, lsn: 7E8/66A0B750, prev 7E8/66A0B718, desc: UPDATE off 8 xmax 140005498 flags 0x10 KEYS_UPDATED ; new off 5 xmax 140005498, blkref #0: rel 17963/16644/195328502 blk 3038837, blkref #1: rel 17963/16644/195328502 blk 3038500
rmgr: Btree len (rec/tot): 64/ 64, tx: 140005498, lsn: 7E8/66A0BE70, prev 7E8/66A0B750, desc: INSERT_LEAF off 158, blkref #0: rel 17963/16644/195349680 blk 78396
rmgr: Btree len (rec/tot): 72/ 72, tx: 140005498, lsn: 7E8/66A0BEB0, prev 7E8/66A0BE70, desc: INSERT_LEAF off 72, blkref #0: rel 17963/16644/419942198 blk 82241
rmgr: Btree len (rec/tot): 64/ 64, tx: 140005498, lsn: 7E8/66A0BEF8, prev 7E8/66A0BEB0, desc: INSERT_LEAF off 268, blkref #0: rel 17963/16644/419946394 blk 18486
rmgr: Btree len (rec/tot): 64/ 64, tx: 140005498, lsn: 7E8/66A0BF38, prev 7E8/66A0BEF8, desc: INSERT_LEAF off 69, blkref #0: rel 17963/16644/419946856 blk 56634
rmgr: Btree len (rec/tot): 64/ 64, tx: 140005498, lsn: 7E8/66A0BF78, prev 7E8/66A0BF38, desc: INSERT_LEAF off 98, blkref #0: rel 17963/16644/419947709 blk 19044
rmgr: Btree len (rec/tot): 64/ 64, tx: 140005498, lsn: 7E8/66A0BFB8, prev 7E8/66A0BF78, desc: INSERT_LEAF off 53, blkref #0: rel 17963/16644/419948176 blk 10715
rmgr: Transaction len (rec/tot): 65/ 65, tx: 140005498, lsn: 7E8/66A0BFF8, prev 7E8/66A0BFB8, desc: COMMIT 2022-09-28 01:38:20.785822 UTC; origin: node 1, lsn CE64/A5091128, at 2022-09-28 01:38:20.784288 UTC
rmgr: XLOG len (rec/tot): 34/ 34, tx: 0, lsn: 7E8/66A0C058, prev 7E8/66A0BFF8, desc: BACKUP_END 7D5/3B068338
rmgr: XLOG len (rec/tot): 24/ 24, tx: 0, lsn: 7E8/66A0C080, prev 7E8/66A0C058, desc: SWITCH

Another thing that has changed in this area is related to continuation
records. What are the WAL contents at the end of
00000001000007E800000066? Do you spot an OVERWRITE_CONTRECORD when
using pg_waldump on it?

No. There is no such thing in pg_waldump on *66 file.

If I understand correctly - this would suggest that the .66 file is the end of
wal taken from pg_basebackup. What are the 67 and next files then?

Best regards,

depesz

#4Jeff Janes
jeff.janes@gmail.com
In reply to: hubert depesz lubaczewski (#1)
Re: pg ignores wal files in pg_wal, and instead tries to load them from archive/primary

Currently our goto fix is:
1. increase wal_keep_size to ~ 200GB
2. standaup replica
3. once it catches up decrease wal_keep_size to standard (for us) 16GB

but it is not really nice solution.

If you don't have an archive and aren't using a permanent slot, then you
always have the risk of blowing past your available WAL.

Increasing wal_keep_size is an option to reduce this risk. Yes, it is not
so nice, but you have already declined to use the nicer options.

Where those extra unusable files came from is a curiosity, but nothing more
than that. I'm guessing they were created by the wrapper you said you use,
as I can't reproduce their existence with just pg_basebackup. They are
probably "recycled ahead" files from the master that somehow got copied
needlessly, and with a file name which does not match the file contents
(since they contents are from a previous name the file had). pg_waldump
isn't as helpful as it could be in those cases.

Cheers,

Jeff