"missing chunk number 0 for toast value xxx in pg_toast_xxx" when pg_basebackup
Hi,
When I use pg_basebackup to backup and restore db(Let's call it A) to a
standalone instance(Let's call it B), "missing chunk number 0 for toast
value xxx in pg_toast_xxx" errors output.
PG version: 10.3
pg_basebackup command:
/usr/pgsql-10/bin/pg_basebackup -h p-rdb-c01 -D /var/lib/pgsql/10/data
-Xs -P -n --waldir=/tmp/pg_wal
I have mounted a disk to /tmp/pg_wal before, then I will mount the disk
to /var/lib/pgsql/10/data/pg_wal, so as to ensure completeness of wal
records during backup.
Since I don't want B to be a standy server, I just want it to be a
standalone server.
I removed recovery.conf, then simply start postgresql-10.service. It turned
out that postgresql-10.service
can be started successfully. But when I use this postgresql(reindex, vacumm
and so on), "missing chunk number 0 for toast value xxx in pg_toast_xxx"
errors output.
When pg_basebackup, it will store wal under pg_wal, can't postgresql work
with wal records locally?
I think primary_conninfo in recovery.conf is just used to get newer wal
records from A. Right?
I have also tested:
If I start postgresql-10.service with recovery.conf firstly, then split it
from postgresql cluster, everything works fine.
Above test seems proved that it is wal records's problem. I am really
confused.
Regards
Ma Xinjian
--
Sent from: https://www.postgresql-archive.org/PostgreSQL-general-f1843780.html
On Tue, 2021-04-13 at 02:38 -0700, Ma Xinjian wrote:
When I use pg_basebackup to backup and restore db(Let's call it A) to a
standalone instance(Let's call it B), "missing chunk number 0 for toast
value xxx in pg_toast_xxx" errors output.PG version: 10.3
pg_basebackup command:
/usr/pgsql-10/bin/pg_basebackup -h p-rdb-c01 -D /var/lib/pgsql/10/data
-Xs -P -n --waldir=/tmp/pg_wal
I have mounted a disk to /tmp/pg_wal before, then I will mount the disk
to /var/lib/pgsql/10/data/pg_wal, so as to ensure completeness of wal
records during backup.Since I don't want B to be a standy server, I just want it to be a
standalone server.
I removed recovery.conf, then simply start postgresql-10.service. It turned
out that postgresql-10.service
can be started successfully. But when I use this postgresql(reindex, vacumm
and so on), "missing chunk number 0 for toast value xxx in pg_toast_xxx"
errors output.When pg_basebackup, it will store wal under pg_wal, can't postgresql work
with wal records locally?
I think primary_conninfo in recovery.conf is just used to get newer wal
records from A. Right?I have also tested:
If I start postgresql-10.service with recovery.conf firstly, then split it
from postgresql cluster, everything works fine.Above test seems proved that it is wal records's problem. I am really
confused.
Your mail got me confused...
Why do you write the WAL to /tmp/pg_wal, only to later mount that at the
default location?
I see nothing wrong with what you are doing, but I may have got lost
in your complicated procedure.
You don't happen to remove "backup_label", do you?
Yours,
Laurenz Albe
--
Cybertec | https://www.cybertec-postgresql.com
Why do you write the WAL to /tmp/pg_wal, only to later mount that at the
default location?
pg_wal dir has size limitation, if wal files are too large, they will be
overwrited, right?
I see nothing wrong with what you are doing, but I may have got lost in
your complicated procedure.
You don't happen to remove "backup_label", do you?
em, I do remove backup_label...
1. It means recovery.conf is not necessary, backup_label is necessary?
2. Which key in backup_label is necessary?
3. I searched the log, it do has recoveried.
Then, if there is no backup_label, what's the default START WAL LOCATION and
CHECKPOINT LOCATION?
--
Sent from: https://www.postgresql-archive.org/PostgreSQL-general-f1843780.html
Ma Xinjian <maxj.fnst@fujitsu.com> writes:
When I use pg_basebackup to backup and restore db(Let's call it A) to a
standalone instance(Let's call it B), "missing chunk number 0 for toast
value xxx in pg_toast_xxx" errors output.
PG version: 10.3
10.3 is quite a few bug fixes ago. Maybe you'd have better results
with the current release (10.16).
regards, tom lane
On Tue, 2021-04-13 at 06:36 -0700, MaXinjian wrote:
Why do you write the WAL to /tmp/pg_wal, only to later mount that at the
default location?pg_wal dir has size limitation, if wal files are too large, they will be
overwrited, right?
No, they won't.
You could run out of space on the file system though.
I see nothing wrong with what you are doing, but I may have got lost in
your complicated procedure.
You don't happen to remove "backup_label", do you?em, I do remove backup_label...
Then that's your problem.
That will corrupt your data, because recovery starts from the wrong
checkpoint.
1. It means recovery.conf is not necessary, backup_label is necessary?
Yes, exactly.
2. Which key in backup_label is necessary?
The whole file needs to be preserved unchanged, just as it is.
Don't mess with that file.
3. I searched the log, it do has recoveried.
Then, if there is no backup_label, what's the default START WAL LOCATION and
CHECKPOINT LOCATION?
That's the catch.
"backup_label" is the *only way* to tell a backup from a crashed
PostgreSQL cluster.
If there is no "backup_label", PostgreSQL will get the latest checkpoint
from the control file (global/pg_control), which may well be later than
the checkpoint that started the backup, so you will miss to recover some
transactions.
Yours,
Laurenz Albe
--
Cybertec | https://www.cybertec-postgresql.com