"missing chunk number 0 for toast value xxx in pg_toast_xxx" when pg_basebackup

Started by Ma Xinjianabout 5 years ago5 messagesgeneral

maxj.fnst@fujitsu.com

about 5 years ago

Hi,

When I use pg_basebackup to backup and restore db(Let's call it A) to a
standalone instance(Let's call it B), "missing chunk number 0 for toast
value xxx in pg_toast_xxx" errors output.

PG version: 10.3
pg_basebackup command:
/usr/pgsql-10/bin/pg_basebackup -h p-rdb-c01 -D /var/lib/pgsql/10/data
-Xs -P -n --waldir=/tmp/pg_wal
I have mounted a disk to /tmp/pg_wal before, then I will mount the disk
to /var/lib/pgsql/10/data/pg_wal, so as to ensure completeness of wal
records during backup.

Since I don't want B to be a standy server, I just want it to be a
standalone server.
I removed recovery.conf, then simply start postgresql-10.service. It turned
out that postgresql-10.service
can be started successfully. But when I use this postgresql(reindex, vacumm
and so on), "missing chunk number 0 for toast value xxx in pg_toast_xxx"
errors output.

When pg_basebackup, it will store wal under pg_wal, can't postgresql work
with wal records locally?
I think primary_conninfo in recovery.conf is just used to get newer wal
records from A. Right?

I have also tested:
If I start postgresql-10.service with recovery.conf firstly, then split it
from postgresql cluster, everything works fine.

Above test seems proved that it is wal records's problem. I am really
confused.

Regards
Ma Xinjian

--
Sent from: https://www.postgresql-archive.org/PostgreSQL-general-f1843780.html

Laurenz Albe

laurenz.albe@cybertec.at

about 5 years ago

In reply to: Ma Xinjian (#1)

Re: "missing chunk number 0 for toast value xxx in pg_toast_xxx" when pg_basebackup

On Tue, 2021-04-13 at 02:38 -0700, Ma Xinjian wrote:

When I use pg_basebackup to backup and restore db(Let's call it A) to a
standalone instance(Let's call it B), "missing chunk number 0 for toast
value xxx in pg_toast_xxx" errors output.

PG version: 10.3
pg_basebackup command:
/usr/pgsql-10/bin/pg_basebackup -h p-rdb-c01 -D /var/lib/pgsql/10/data
-Xs -P -n --waldir=/tmp/pg_wal
I have mounted a disk to /tmp/pg_wal before, then I will mount the disk
to /var/lib/pgsql/10/data/pg_wal, so as to ensure completeness of wal
records during backup.

Since I don't want B to be a standy server, I just want it to be a
standalone server.
I removed recovery.conf, then simply start postgresql-10.service. It turned
out that postgresql-10.service
can be started successfully. But when I use this postgresql(reindex, vacumm
and so on), "missing chunk number 0 for toast value xxx in pg_toast_xxx"
errors output.

When pg_basebackup, it will store wal under pg_wal, can't postgresql work
with wal records locally?
I think primary_conninfo in recovery.conf is just used to get newer wal
records from A. Right?

I have also tested:
If I start postgresql-10.service with recovery.conf firstly, then split it
from postgresql cluster, everything works fine.

Above test seems proved that it is wal records's problem. I am really
confused.

Your mail got me confused...

Why do you write the WAL to /tmp/pg_wal, only to later mount that at the
default location?

I see nothing wrong with what you are doing, but I may have got lost
in your complicated procedure.

You don't happen to remove "backup_label", do you?

Yours,
Laurenz Albe
--
Cybertec | https://www.cybertec-postgresql.com

MaXinjian

machine_ok@163.com

about 5 years ago

In reply to: Laurenz Albe (#2)

Re: "missing chunk number 0 for toast value xxx in pg_toast_xxx" when pg_basebackup

Why do you write the WAL to /tmp/pg_wal, only to later mount that at the
default location?

pg_wal dir has size limitation, if wal files are too large, they will be
overwrited, right?

I see nothing wrong with what you are doing, but I may have got lost in
your complicated procedure.

You don't happen to remove "backup_label", do you?

em, I do remove backup_label...

1. It means recovery.conf is not necessary, backup_label is necessary?
2. Which key in backup_label is necessary?
3. I searched the log, it do has recoveried.
Then, if there is no backup_label, what's the default START WAL LOCATION and
CHECKPOINT LOCATION？

--
Sent from: https://www.postgresql-archive.org/PostgreSQL-general-f1843780.html

Tom Lane

tgl@sss.pgh.pa.us

about 5 years ago

In reply to: Ma Xinjian (#1)

Re: "missing chunk number 0 for toast value xxx in pg_toast_xxx" when pg_basebackup

Ma Xinjian <maxj.fnst@fujitsu.com> writes:

When I use pg_basebackup to backup and restore db(Let's call it A) to a
standalone instance(Let's call it B), "missing chunk number 0 for toast
value xxx in pg_toast_xxx" errors output.
PG version: 10.3

10.3 is quite a few bug fixes ago. Maybe you'd have better results
with the current release (10.16).

regards, tom lane

Laurenz Albe

laurenz.albe@cybertec.at

about 5 years ago

In reply to: MaXinjian (#3)

Re: "missing chunk number 0 for toast value xxx in pg_toast_xxx" when pg_basebackup

On Tue, 2021-04-13 at 06:36 -0700, MaXinjian wrote:

Why do you write the WAL to /tmp/pg_wal, only to later mount that at the
default location?

pg_wal dir has size limitation, if wal files are too large, they will be
overwrited, right?

No, they won't.
You could run out of space on the file system though.

I see nothing wrong with what you are doing, but I may have got lost in
your complicated procedure.
You don't happen to remove "backup_label", do you?

em, I do remove backup_label...

Then that's your problem.
That will corrupt your data, because recovery starts from the wrong
checkpoint.

1. It means recovery.conf is not necessary, backup_label is necessary?

Yes, exactly.

2. Which key in backup_label is necessary?

The whole file needs to be preserved unchanged, just as it is.
Don't mess with that file.

3. I searched the log, it do has recoveried.
Then, if there is no backup_label, what's the default START WAL LOCATION and
CHECKPOINT LOCATION？

That's the catch.

"backup_label" is the *only way* to tell a backup from a crashed
PostgreSQL cluster.

If there is no "backup_label", PostgreSQL will get the latest checkpoint
from the control file (global/pg_control), which may well be later than
the checkpoint that started the backup, so you will miss to recover some
transactions.

Yours,
Laurenz Albe
--
Cybertec | https://www.cybertec-postgresql.com