Wall shiping replica failed to recover database with error: invalid contrecord length 1956 at FED/38FFE208

Started by Aleš Zelenýover 6 years ago8 messagesgeneral

zeleny.ales@gmail.com

over 6 years ago

Hello,

we run out of disk space on our production primary database on file system
dedicated for WALs (one of our logical replica died and thus WALs were
accumulated).
As expeced, primary instance shuts down:

Primary instance:
PostgreSQL 11.3 (Ubuntu 11.3-1.pgdg18.04+1) on x86_64-pc-linux-gnu,
compiled by gcc (Ubuntu 7.4.0-1ubuntu1~18.04) 7.4.0, 64-bit

2019-09-29 19:24:28 UTC 2241 5d36488f.8c1 1 0 [53100]:PANIC: could
not write to file "pg_wal/xlogtemp.2241": No space left on device
2019-09-29 19:24:30 UTC 1706 5d364871.6aa 3387 0 [00000]:LOG: WAL
writer process (PID 2241) was terminated by signal 6: Aborted
2019-09-29 19:24:30 UTC 1706 5d364871.6aa 3388 0 [00000]:LOG:
terminating any other active server processes

2019-09-29 19:24:30 UTC 1706 5d364871.6aa 3390 0 [00000]:LOG: all
server processes terminated; reinitializing
2019-09-29 19:24:30 UTC 21499 5d9104ee.53fb 1 0 [00000]:LOG:
database system was interrupted; last known up at 2019-09-29 19:23:47 UTC

2019-09-29 19:24:33 UTC 21499 5d9104ee.53fb 2 0 [00000]:LOG:
recovered replication state of node 1 to CF3/442D0758
2019-09-29 19:24:33 UTC 21499 5d9104ee.53fb 3 0 [00000]:LOG:
database system was not properly shut down; automatic recovery in progress
2019-09-29 19:24:33 UTC 21499 5d9104ee.53fb 4 0 [00000]:LOG: redo
starts at FE9/A49830D8

2019-09-29 19:25:55 UTC 21499 5d9104ee.53fb 5 0 [00000]:LOG: redo
done at FED/38FFC540
2019-09-29 19:25:55 UTC 21499 5d9104ee.53fb 6 0 [00000]:LOG: last
completed transaction was at log time 2019-09-29 19:22:06.597333+00
2019-09-29 19:25:55 UTC 21499 5d9104ee.53fb 7 0 [00000]:LOG:
checkpoint starting: end-of-recovery immediate

2019-09-29 19:26:59 UTC 21499 5d9104ee.53fb 8 0 [00000]:LOG:
checkpoint complete: wrote 1046940 buffers (99.8%); 0 WAL file(s) added, 0
removed, 0 recycled; write=64.166 s, sync=0.008 s, total=64.212 s; sync
files=184, longest=0.001 s, average=0.000 s; distance=15014380 kB,
estimate=15014380 kB
2019-09-29 19:26:59 UTC 21499 5d9104ee.53fb 9 0 [53100]:FATAL: could
not write to file "pg_wal/xlogtemp.21499": No space left on device
2019-09-29 19:26:59 UTC 1706 5d364871.6aa 3391 0 [00000]:LOG:
startup process (PID 21499) exited with exit code 1
2019-09-29 19:26:59 UTC 1706 5d364871.6aa 3392 0 [00000]:LOG:
aborting startup due to startup process failure
2019-09-29 19:26:59 UTC 1706 5d364871.6aa 3393 0 [00000]:LOG:
database system is shut down

Expceted bahavior, free space on file system for WAL was 14MB, so not
enough for next WAL file to be saved.

Once falied logical replica was back up & running, we have extened
filesystem for WALs and started primary instance:

2019-09-30 08:21:44 UTC 13635 5d91bb18.3543 1 0 [00000]:LOG: Auto
detecting pg_stat_kcache.linux_hz parameter...
2019-09-30 08:21:44 UTC 13635 5d91bb18.3543 2 0 [00000]:LOG:
pg_stat_kcache.linux_hz is set to 1000000
2019-09-30 08:21:44 UTC 13635 5d91bb18.3543 3 0 [00000]:LOG:
listening on IPv4 address "0.0.0.0", port 5432
2019-09-30 08:21:44 UTC 13635 5d91bb18.3543 4 0 [00000]:LOG:
listening on IPv6 address "::", port 5432
2019-09-30 08:21:44 UTC 13635 5d91bb18.3543 5 0 [00000]:LOG:
listening on Unix socket "/var/run/postgresql/.s.PGSQL.5432"
2019-09-30 08:21:44 UTC 13637 5d91bb18.3545 1 0 [00000]:LOG:
database system was shut down at 2019-09-29 19:26:59 UTC
2019-09-30 08:21:44 UTC 13637 5d91bb18.3545 2 0 [00000]:LOG:
recovered replication state of node 1 to CF3/442D0758
2019-09-30 08:21:44 UTC 13646 5d91bb18.354e 1 0 [00000]:LOG:
pg_partman master background worker master process initialized with role
postgres
2019-09-30 08:21:44 UTC 13635 5d91bb18.3543 6 0 [00000]:LOG:
database system is ready to accept connections
2019-09-30 08:21:44 UTC 13645 5d91bb18.354d 1 0 2/0 [00000]:LOG:
pg_cron scheduler started
2019-09-30 08:21:44 UTC 13647 5d91bb18.354f 1 0 3/0 [00000]:LOG: POWA
connected to database powa

So primary instance stated, all logical replications streams connected and
as soon as logical replication for failed subscriber process WALs,
unnecessery WALs were deleted as expected. Until this point, all works like
a charm from primary instance point of view.

Form this primary instance we are running wall shipping replica (pgBackRest
is used) and this is where issues started.

WAL recovery replica:
PostgreSQL 11.3 (Ubuntu 11.3-1.pgdg18.04+1) on x86_64-pc-linux-gnu,
compiled by gcc (Ubuntu 7.4.0-1ubuntu1~18.04) 7.4.0, 64-bit

WAL files were restored from backup till last one backed up before primary
instance shut down.

2019-09-29 19:24:31.390 P00 INFO: found 0000000100000FED00000037 in the
archive
2019-09-29 19:24:31.391 P00 INFO: archive-get command end: completed
successfully (43ms)
2019-09-29 19:24:31 UTC 3062 5d76145f.bf6 181012 0 1/0 [00000]:LOG:
restored log file "0000000100000FED00000037" from archive
2019-09-29 19:24:31.919 P00 INFO: archive-get command begin 2.16:
[0000000100000FED00000038, pg_wal/RECOVERYXLOG] --archive-async
--buffer-size=16777216 --compress-level=3 --log-level-console=info
--pg1-path=/pgsql/anl_master_rep/pgcluster/11/pgdata --process-max=2
--repo1-cipher-type=none --repo1-path=/pgbackrest
--repo1-s3-bucket=finmason-db-backups --repo1-s3-endpoint=
s3.us-east-1.amazonaws.com --repo1-s3-key=<redacted>
--repo1-s3-key-secret=<redacted> --repo1-s3-region=us-east-1
--repo1-type=s3 --spool-path=/var/spool/pgbackrest --stanza=anl_master_prod
2019-09-29 19:24:31.958 P00 INFO: found 0000000100000FED00000038 in the
archive
2019-09-29 19:24:31.959 P00 INFO: archive-get command end: completed
successfully (41ms)
2019-09-29 19:24:31 UTC 3062 5d76145f.bf6 181013 0 1/0 [00000]:LOG:
restored log file "0000000100000FED00000038" from archive
2019-09-29 19:24:32.505 P00 INFO: archive-get command begin 2.16:
[0000000100000FED00000039, pg_wal/RECOVERYXLOG] --archive-async
--buffer-size=16777216 --compress-level=3 --log-level-console=info
--pg1-path=/pgsql/anl_master_rep/pgcluster/11/pgdata --process-max=2
--repo1-cipher-type=none --repo1-path=/pgbackrest
--repo1-s3-bucket=finmason-db-backups --repo1-s3-endpoint=
s3.us-east-1.amazonaws.com --repo1-s3-key=<redacted>
--repo1-s3-key-secret=<redacted> --repo1-s3-region=us-east-1
--repo1-type=s3 --spool-path=/var/spool/pgbackrest --stanza=anl_master_prod
2019-09-29 19:24:32.505 P00 INFO: unable to find 0000000100000FED00000039
in the archive
2019-09-29 19:24:32.505 P00 INFO: archive-get command end: completed
successfully (2ms)
2019-09-29 19:24:34.247 P00 INFO: archive-get command begin 2.16:
[0000000100000FED00000039, pg_wal/RECOVERYXLOG] --archive-async
--buffer-size=16777216 --compress-level=3 --log-level-console=info
--pg1-path=/pgsql/anl_master_rep/pgcluster/11/pgdata --process-max=2
--repo1-cipher-type=none --repo1-path=/pgbackrest
--repo1-s3-bucket=finmason-db-backups --repo1-s3-endpoint=
s3.us-east-1.amazonaws.com --repo1-s3-key=<redacted>
--repo1-s3-key-secret=<redacted> --repo1-s3-region=us-east-1
--repo1-type=s3 --spool-path=/var/spool/pgbackrest --stanza=anl_master_prod
2019-09-29 19:24:34.448 P00 INFO: unable to find 0000000100000FED00000039
in the archive
2019-09-29 19:24:34.448 P00 INFO: archive-get command end: completed
successfully (204ms)

So the replica instance was paintfully waiting and waiting and waiting for
next WAL: 0000000100000FED00000039

And yes, after the long outage pgBackRest found and restored WAL
0000000100000FED00000039:

2019-09-30 08:22:03.673 P00 INFO: archive-get command begin 2.16:
[0000000100000FED00000039, pg_wal/RECOVERYXLOG] --archive-async
--buffer-size=16777216 --compress-level=3 --log-level-console=info
--pg1-path=/
pgsql/anl_master_rep/pgcluster/11/pgdata --process-max=2
--repo1-cipher-type=none --repo1-path=/pgbackrest
--repo1-s3-bucket=finmason-db-backups --repo1-s3-endpoint=
s3.us-east-1.amazonaws.com --repo1-s3-key=<red
acted> --repo1-s3-key-secret=<redacted> --repo1-s3-region=us-east-1
--repo1-type=s3 --spool-path=/var/spool/pgbackrest --stanza=anl_master_prod
2019-09-30 08:22:03.874 P00 INFO: unable to find 0000000100000FED00000039
in the archive
2019-09-30 08:22:03.875 P00 INFO: archive-get command end: completed
successfully (203ms)
2019-09-30 08:22:08.678 P00 INFO: archive-get command begin 2.16:
[0000000100000FED00000039, pg_wal/RECOVERYXLOG] --archive-async
--buffer-size=16777216 --compress-level=3 --log-level-console=info
--pg1-path=/
pgsql/anl_master_rep/pgcluster/11/pgdata --process-max=2
--repo1-cipher-type=none --repo1-path=/pgbackrest
--repo1-s3-bucket=finmason-db-backups --repo1-s3-endpoint=
s3.us-east-1.amazonaws.com --repo1-s3-key=<red
acted> --repo1-s3-key-secret=<redacted> --repo1-s3-region=us-east-1
--repo1-type=s3 --spool-path=/var/spool/pgbackrest --stanza=anl_master_prod
2019-09-30 08:22:09.121 P00 INFO: found 0000000100000FED00000039 in the
archive
2019-09-30 08:22:09.123 P00 INFO: archive-get command end: completed
successfully (446ms)
2019-09-30 08:22:09 UTC 3062 5d76145f.bf6 181014 0 1/0 [00000]:LOG:
restored log file "0000000100000FED00000039" from archive
2019-09-30 08:22:09 UTC 3062 5d76145f.bf6 181015 0 1/0 [00000]:LOG:
invalid contrecord length 1956 at FED/38FFE208
2019-09-30 08:22:13.683 P00 INFO: archive-get command begin 2.16:
[0000000100000FED00000038, pg_wal/RECOVERYXLOG] --archive-async
--buffer-size=16777216 --compress-level=3 --log-level-console=info
--pg1-path=/pgsql/anl_master_rep/pgcluster/11/pgdata --process-max=2
--repo1-cipher-type=none --repo1-path=/pgbackrest
--repo1-s3-bucket=finmason-db-backups --repo1-s3-endpoint=
s3.us-east-1.amazonaws.com --repo1-s3-key=<redacted>
--repo1-s3-key-secret=<redacted> --repo1-s3-region=us-east-1
--repo1-type=s3 --spool-path=/var/spool/pgbackrest --stanza=anl_master_prod
2019-09-30 08:22:14.132 P00 INFO: found 0000000100000FED00000038 in the
archive
2019-09-30 08:22:14.135 P00 INFO: archive-get command end: completed
successfully (453ms)
2019-09-30 08:22:14 UTC 3062 5d76145f.bf6 181016 0 1/0 [00000]:LOG:
restored log file "0000000100000FED00000038" from archive
2019-09-30 08:22:14.195 P00 INFO: archive-get command begin 2.16:
[0000000100000FED00000039, pg_wal/RECOVERYXLOG] --archive-async
--buffer-size=16777216 --compress-level=3 --log-level-console=info
--pg1-path=/pgsql/anl_master_rep/pgcluster/11/pgdata --process-max=2
--repo1-cipher-type=none --repo1-path=/pgbackrest
--repo1-s3-bucket=finmason-db-backups --repo1-s3-endpoint=
s3.us-east-1.amazonaws.com --repo1-s3-key=<redacted>
--repo1-s3-key-secret=<redacted> --repo1-s3-region=us-east-1
--repo1-type=s3 --spool-path=/var/spool/pgbackrest --stanza=anl_master_prod
2019-09-30 08:22:14.236 P00 INFO: found 0000000100000FED00000039 in the
archive
2019-09-30 08:22:14.238 P00 INFO: archive-get command end: completed
successfully (44ms)
2019-09-30 08:22:14 UTC 3062 5d76145f.bf6 181017 0 1/0 [00000]:LOG:
restored log file "0000000100000FED00000039" from archive
2019-09-30 08:22:14 UTC 3062 5d76145f.bf6 181018 0 1/0 [00000]:LOG:
invalid contrecord length 1956 at FED/38FFE208

But recovery on replica failed to proceed WAL file
0000000100000FED00000039 with log message: " invalid contrecord length
1956 at FED/38FFE208".

Since the WAL 0000000100000FED00000039 still exists on primary instance,
I've copied it to WAL replica, but it was identical to the file restored by
pgBackRest, so I assume from this, that WAL file backup/restore was not
root cause fro this issue.

Recovery conf on replica instance is simple:

standby_mode = 'on'
restore_command = 'pgbackrest --log-level-console=info --process-max=6
--stanza=anl_master_prod archive-get %f "%p"'

During planned properly proceeded shutdown on primary instance, recovery on
WAL replica resumes smoothly.

And for sure, after restoring database from backup taken past the primary
instance issue, WAL replica works perfectly, as usual.

Except the obvious need to re-configure monitoring to warn on WAL free
space sooner, to avoid such case, my expectation was, taht WAL replica will
be able to resume recovery it was not.

We have stored copy of 0000000100000FED0000003[89] WAL file for analyses,
is there something what can help us to find the issu root cause and
mitigate this issue?

I've found BUG #15412,seems to be same issue, but remains open.

Thanks for any hints how to recover from this issue or analyze it deeper to
find root cause.

Kind regards Ales Zeleny

Stephen Frost

sfrost@snowman.net

over 6 years ago

In reply to: Aleš Zelený (#1)

Re: Wall shiping replica failed to recover database with error: invalid contrecord length 1956 at FED/38FFE208

Greetings,

* Aleš Zelený (zeleny.ales@gmail.com) wrote:

But recovery on replica failed to proceed WAL file
0000000100000FED00000039 with log message: " invalid contrecord length
1956 at FED/38FFE208".

Err- you've drawn the wrong conclusion from that message (and you're
certainly not alone- it's a terrible message and we should really have a
HINT there or something). That's an INFO-level message, not an error,
and basically just means "oh, look, there's an invalid WAL record, guess
we got to the end of the WAL available from this source." If you had
had primary_conninfo configured in your recovery.conf, PG would likely
have connected to the primary and started replication. One other point
is that if you actually did a promotion in this process somewhere, then
you might want to set recovery_target_timeline=latest, to make sure the
replica follows along on the timeline switch that happens when a
promotion happens.

Thanks,

Stephen

Tom Lane

tgl@sss.pgh.pa.us

over 6 years ago

In reply to: Stephen Frost (#2)

Re: Wall shiping replica failed to recover database with error: invalid contrecord length 1956 at FED/38FFE208

Stephen Frost <sfrost@snowman.net> writes:

* Aleš Zelený (zeleny.ales@gmail.com) wrote:

But recovery on replica failed to proceed WAL file
0000000100000FED00000039 with log message: " invalid contrecord length
1956 at FED/38FFE208".

Err- you've drawn the wrong conclusion from that message (and you're
certainly not alone- it's a terrible message and we should really have a
HINT there or something).

Yeah, those messages are all pretty ancient, from when WAL was new and not
to be trusted much. Perhaps the thing to do is move the existing info
into DETAIL and make the primary message be something like "reached
apparent end of WAL stream".

regards, tom lane

Stephen Frost

sfrost@snowman.net

over 6 years ago

In reply to: Tom Lane (#3)

Re: Wall shiping replica failed to recover database with error: invalid contrecord length 1956 at FED/38FFE208

Greetings,

* Tom Lane (tgl@sss.pgh.pa.us) wrote:

Stephen Frost <sfrost@snowman.net> writes:

* Aleš Zelený (zeleny.ales@gmail.com) wrote:

But recovery on replica failed to proceed WAL file
0000000100000FED00000039 with log message: " invalid contrecord length
1956 at FED/38FFE208".

Err- you've drawn the wrong conclusion from that message (and you're
certainly not alone- it's a terrible message and we should really have a
HINT there or something).

Yeah, those messages are all pretty ancient, from when WAL was new and not
to be trusted much. Perhaps the thing to do is move the existing info
into DETAIL and make the primary message be something like "reached
apparent end of WAL stream".

Yes, +1 on that.

Thanks,

Stephen

Aleš Zelený

zeleny.ales@gmail.com

over 6 years ago

In reply to: Stephen Frost (#2)

Re: Wall shiping replica failed to recover database with error: invalid contrecord length 1956 at FED/38FFE208

Hello,

čt 3. 10. 2019 v 0:09 odesílatel Stephen Frost <sfrost@snowman.net> napsal:

Greetings,

* Aleš Zelený (zeleny.ales@gmail.com) wrote:

But recovery on replica failed to proceed WAL file
0000000100000FED00000039 with log message: " invalid contrecord length
1956 at FED/38FFE208".

Err- you've drawn the wrong conclusion from that message (and you're
certainly not alone- it's a terrible message and we should really have a
HINT there or something). That's an INFO-level message, not an error,
and basically just means "oh, look, there's an invalid WAL record, guess
we got to the end of the WAL available from this source." If you had
had primary_conninfo configured in your recovery.conf, PG would likely
have connected to the primary and started replication. One other point
is that if you actually did a promotion in this process somewhere, then
you might want to set recovery_target_timeline=latest, to make sure the
replica follows along on the timeline switch that happens when a
promotion happens.

Thanks (for all comments form others as well) for this explanation. I've

failed to describe properly our case. We are recovering the replica
instance from WALs only, since the replica is in separate network (used as
source for zfs clonning for development), so there is no primary_conninfo
(replica can't influence primary instance any way, it juts consumes primary
instance WALs) and we did not performed replica promotion.

I'd guess, thath on out of disk space issue, last WAL might be incomplete,
but the size was expected 16777216 Bytes on primary instance disk and it
was binary identical to file restores on replica from backup. The issue
wsas, that replica emit this INFO message, but it was not able to move to
next wal file and started falling behind primary instance.

If the WAL was incomplete during out of space it probably might be
appeneded during instance start ( but I'll doubt incomplete archive_command
to be invoced on incomplete WAL), that is why I have checked the file on
primary (after it was back up&running) with restored one on replica
instance.

In orther words, if this log message will be emmited only once and recovery
continue retoring subsequent WALs, I'll be OK with that, but due to
recovery stucked at this WAL I'm in doubts whether I did something wrong
(e.g. improper recovery.conf ...) or what is possible workaround to enable
replica (if possible) proceed this wal and continue with recovery. The
database size is almost 2 TB, so that is why I'd like to avoid full
restores to create DEV environments and using ZFS clones instead.

Thanks for any hints how to let replica continue applying WAL files.

Kind regards Ales Zeleny

Kyotaro Horiguchi

horikyota.ntt@gmail.com

over 6 years ago

In reply to: Stephen Frost (#4)

Re: Wall shiping replica failed to recover database with error: invalid contrecord length 1956 at FED/38FFE208

Hello.

At Wed, 2 Oct 2019 19:24:02 -0400, Stephen Frost <sfrost@snowman.net> wrote in

Greetings,

* Tom Lane (tgl@sss.pgh.pa.us) wrote:

Stephen Frost <sfrost@snowman.net> writes:

* Aleš Zelený (zeleny.ales@gmail.com) wrote:

But recovery on replica failed to proceed WAL file
0000000100000FED00000039 with log message: " invalid contrecord length
1956 at FED/38FFE208".

Err- you've drawn the wrong conclusion from that message (and you're
certainly not alone- it's a terrible message and we should really have a
HINT there or something).

Yeah, those messages are all pretty ancient, from when WAL was new and not
to be trusted much. Perhaps the thing to do is move the existing info
into DETAIL and make the primary message be something like "reached
apparent end of WAL stream".

Yes, +1 on that.

The attached is something like that. AFAICS we can assume that an
invalid record means the end of WAL as long as we are fetching
successive records and XLogPageRead() has not logged anything.

As the current comment in ReadRecord says, we don't have a message if
standby has been triggered, but that's not always the case. Still we
may have a message if new segment is available after triggering.

I used a kind-of-tricky way to handle optional errdetail but it may be
better to have some additional feature as an ereport
subfunction. (Maybe named "errif" or something?)

I think it works fine as expected but I find one arguable behavior.

In the first patch, the "end of WAL" message is accompanied by source,
LSN and timeline.

LOG: reached end of WAL in archive at 0/A3000060 on timeline 8
DETAIL: invalid record length at 0/A3000060: wanted 24, got 0

The last two just seems good but the first one shows "archive" even on
promotion. It is right that we scan archive after promotion but seems
confusing. So I attached another patch that shows operation instead of
WAL source.

LOG: reached end of WAL during streaming reaplication at 0/A5000060 on timeline 8
DETAIL: invalid record length at 0/A5000060: wanted 24, got 0

What do you think about this?

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

Tom Lane

tgl@sss.pgh.pa.us

over 6 years ago

In reply to: Kyotaro Horiguchi (#6)

Re: Wall shiping replica failed to recover database with error: invalid contrecord length 1956 at FED/38FFE208

Kyotaro Horiguchi <horikyota.ntt@gmail.com> writes:

At Wed, 2 Oct 2019 19:24:02 -0400, Stephen Frost <sfrost@snowman.net> wrote in

* Tom Lane (tgl@sss.pgh.pa.us) wrote:

Yeah, those messages are all pretty ancient, from when WAL was new and not
to be trusted much. Perhaps the thing to do is move the existing info
into DETAIL and make the primary message be something like "reached
apparent end of WAL stream".

Yes, +1 on that.

What do you think about this?

It seems overcomplicated. Why do you need to reach into
emode_for_corrupt_record's private state? I think we could just
change the message text without that, and leave the emode
machinations as-is.

I don't understand the restriction to "if (RecPtr == InvalidXLogRecPtr)"
either? Maybe that's fine but the comment fails to explain it.

Another point is that, as the comment for emode_for_corrupt_record
notes, we don't really want to consider that we've hit end-of-WAL
when reading any source except XLOG_FROM_PG_WAL. So I think the
change of message ought to include a test of the source, but this
doesn't.

Also, the business with an "operation" string violates the message
translatability guideline about "don't assemble a message out of
phrases". If we want to have that extra detail, it's better just
to make three separate ereport() calls with separately translatable
messages.

Also, it seems wrong that the page TLI check, just below, is where
it is and isn't part of the main set of page header sanity checks.
That's sort of unrelated to this patch, except not really, because
shouldn't a failure of that test also be treated as an "end of WAL"
condition?

regards, tom lane

Kyotaro Horiguchi

horikyota.ntt@gmail.com

over 6 years ago

In reply to: Tom Lane (#7)

Re: Wall shiping replica failed to recover database with error: invalid contrecord length 1956 at FED/38FFE208

Thank you, and sorry for overlooking your comment.

At Thu, 14 Nov 2019 12:28:13 -0500, Tom Lane <tgl@sss.pgh.pa.us> wrote in

Kyotaro Horiguchi <horikyota.ntt@gmail.com> writes:

At Wed, 2 Oct 2019 19:24:02 -0400, Stephen Frost <sfrost@snowman.net> wrote in

* Tom Lane (tgl@sss.pgh.pa.us) wrote:

Yeah, those messages are all pretty ancient, from when WAL was new and not
to be trusted much. Perhaps the thing to do is move the existing info
into DETAIL and make the primary message be something like "reached
apparent end of WAL stream".

Yes, +1 on that.

What do you think about this?

It seems overcomplicated. Why do you need to reach into
emode_for_corrupt_record's private state?
emode_for_corrupt_record's private state? I think we could just
change the message text without that, and leave the emode
machinations as-is.

I tried to avoid messages at the same LSN. But it is done by
emode_for_corrupt_record by reducing error level. I reverted that
part.

I don't understand the restriction to "if (RecPtr == InvalidXLogRecPtr)"
either? Maybe that's fine but the comment fails to explain it.

"When we end up here while reading successive recored" meant that, but
it is not explicit. If RecPtr was not invalid, it means that the
caller is requesting for a specific record that should exist. It is
not end-of-wal at all. I rewrote the comment.

Another point is that, as the comment for emode_for_corrupt_record
notes, we don't really want to consider that we've hit end-of-WAL
when reading any source except XLOG_FROM_PG_WAL. So I think the
change of message ought to include a test of the source, but this
doesn't.

Maybe no. The function just mutes messages for repeated error on
XLOG_FROM_WAL. We consider end-of-WAL while XLOG_FROM_ARCHIVE. In
this version the "reached end of WAL" is not emitted when
emode_for_corrupt_record decided not to show the message.

Also, the business with an "operation" string violates the message
translatability guideline about "don't assemble a message out of
phrases". If we want to have that extra detail, it's better just
to make three separate ereport() calls with separately translatable
messages.

Though I thought that the operation is just a noun and it is safely
processed separately, I followed your comment. And this version takes
the more verbose one of the previous two.

Also, it seems wrong that the page TLI check, just below, is where
it is and isn't part of the main set of page header sanity checks.
That's sort of unrelated to this patch, except not really, because
shouldn't a failure of that test also be treated as an "end of WAL"
condition?

It seems checking if xlogreader did something wrong, since it is
pluggable. I'm not sure there is any concrete reason for that,
though. As for recovery, it is actually redundant because
XLogFileReadAnyTLI already checked that for the record, but it doesn't
matter.

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

Wall shiping replica failed to recover database with error: invalid contrecord length 1956 at FED/38FFE208

Attachments:

Attachments: