Make mesage at end-of-recovery less scary.
Hello, this is a followup thread of [1]/messages/by-id/20200117.172655.1384889922565817808.horikyota.ntt@gmail.com.
# I didn't noticed that the thread didn't cover -hackers..
When recovery of any type ends, we see several kinds of error messages
that says "WAL is broken".
LOG: invalid record length at 0/7CB6BC8: wanted 24, got 0
LOG: redo is not required
LOG: database system is ready to accept connections
This patch reduces the scariness of such messages as the follows.
LOG: rached end of WAL at 0/1551048 on timeline 1 in pg_wal during crash recovery
DETAIL: invalid record length at 0/1551048: wanted 24, got 0
LOG: redo is not required
LOG: database system is ready to accept connections
[1]: /messages/by-id/20200117.172655.1384889922565817808.horikyota.ntt@gmail.com
I'll register this to the coming CF.
regards.
--
Kyotaro Horiguchi
NTT Open Source Software Center
Attachments:
0001-Make-End-Of-Recovery-error-less-scary.patchtext/x-patch; charset=us-asciiDownload+38-8
On Fri, Feb 28, 2020 at 04:01:00PM +0900, Kyotaro Horiguchi wrote:
Hello, this is a followup thread of [1].
# I didn't noticed that the thread didn't cover -hackers..
When recovery of any type ends, we see several kinds of error messages
that says "WAL is broken".
Have you considered an error context here? Your patch leads to a bit
of duplication with the message a bit down of what you are changing
where the end of local pg_wal is reached.
+ * reached the end of WAL. Otherwise something's really wrong and + * we report just only the errormsg if any. If we don't receive
This sentence sounds strange to me. Or you meant "Something is wrong,
so use errormsg as report if it is set"?
+ * Note: errormsg is alreay translated.
Typo here.
+ if (StandbyMode) + ereport(actual_emode, + (errmsg ("rached end of WAL at %X/%X on timeline %u in %s during streaming replication",
StandbyMode happens also with only WAL archiving, depending on if
primary_conninfo is set or not.
+ (errmsg ("rached end of WAL at %X/%X on timeline %u in %s during crash recovery",
FWIW, you are introducing three times the same typo, in the same
word, in three different messages.
--
Michael
Thank you for the comments.
At Fri, 28 Feb 2020 16:33:18 +0900, Michael Paquier <michael@paquier.xyz> wrote in
On Fri, Feb 28, 2020 at 04:01:00PM +0900, Kyotaro Horiguchi wrote:
Hello, this is a followup thread of [1].
# I didn't noticed that the thread didn't cover -hackers..
When recovery of any type ends, we see several kinds of error messages
that says "WAL is broken".Have you considered an error context here? Your patch leads to a bit
of duplication with the message a bit down of what you are changing
where the end of local pg_wal is reached.
It is a DEBUG message and it is for the time moving from crash
recovery to archive recovery. I could remove that but decided to leave
it for tracability.
+ * reached the end of WAL. Otherwise something's really wrong and + * we report just only the errormsg if any. If we don't receiveThis sentence sounds strange to me. Or you meant "Something is wrong,
so use errormsg as report if it is set"?
The whole comment there follows.
| recovery. If we get here during recovery, we can assume that we
| reached the end of WAL. Otherwise something's really wrong and
| we report just only the errormsg if any. If we don't receive
| errormsg here, we already logged something. We don't emit
| "reached end of WAL" in muted messages.
"Othhersise" means "other than the case of recovery". "Just only the
errmsg" means "show the message not as a part the message "reached end
of WAL".
+ * Note: errormsg is alreay translated.
Typo here.
Thanks. Will fix along with "rached".
+ if (StandbyMode) + ereport(actual_emode, + (errmsg ("rached end of WAL at %X/%X on timeline %u in %s during streaming replication",StandbyMode happens also with only WAL archiving, depending on if
primary_conninfo is set or not.
Right. I'll fix it. Maybe to "during standby mode".
+ (errmsg ("rached end of WAL at %X/%X on timeline %u in %s during crash recovery",
FWIW, you are introducing three times the same typo, in the same
word, in three different messages.
They're copy-pasto. I refrained from constructing an error message
from multiple nonindipendent parts. Are you suggesting to do so?
regards.
--
Kyotaro Horiguchi
NTT Open Source Software Center
Hello.
I changed the condition from randAccess to fetching_ckpt considering
the discussion in another thread [1]. Then I moved the block that
shows the new messages to more appropriate place.
At Fri, 28 Feb 2020 17:28:06 +0900 (JST), Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote in
Have you considered an error context here? Your patch leads to a bit
of duplication with the message a bit down of what you are changing
where the end of local pg_wal is reached.It is a DEBUG message and it is for the time moving from crash
recovery to archive recovery. I could remove that but decided to leave
it for tracability.
I modified the message so that it has the same look to the new
messages, but I left it being DEBUG1, since it is just a intermediate
state. We should finally see one of the new three messages.
After the messages changed, another message from wal sender came to
look redundant.
| [20866] LOG: replication terminated by primary server
| [20866] DETAIL: End of WAL reached on timeline 1 at 0/30001C8.
| [20866] FATAL: could not send end-of-streaming message to primary: no COPY in progress
| [20851] LOG: reached end of WAL at 0/30001C8 on timeline 1 in archive during standby mode
| [20851] DETAIL: invalid record length at 0/30001C8: wanted 24, got 0
I changed the above to the below, which looks more adequate.
| [24271] LOG: replication terminated by primary server on timeline 1 at 0/3000240.
| [24271] FATAL: could not send end-of-streaming message to primary: no COPY in progress
| [24267] LOG: reached end of WAL at 0/3000240 on timeline 1 in archive during standby mode
| [24267] DETAIL: invalid record length at 0/3000240: wanted 24, got 0
+ * reached the end of WAL. Otherwise something's really wrong and + * we report just only the errormsg if any. If we don't receiveThis sentence sounds strange to me. Or you meant "Something is wrong,
so use errormsg as report if it is set"?
The message no longer exists.
+ (errmsg ("rached end of WAL at %X/%X on timeline %u in %s during crash recovery",
FWIW, you are introducing three times the same typo, in the same
word, in three different messages.They're copy-pasto. I refrained from constructing an error message
from multiple nonindipendent parts. Are you suggesting to do so?
The tree times repetition of almost same phrases is very unreadable. I
rewrote it in more simple shape.
regards.
--
Kyotaro Horiguchi
NTT Open Source Software Center
Attachments:
v2-0001-Make-End-Of-Recovery-error-less-scary.patchtext/x-patch; charset=us-asciiDownload+55-21
On 2020-03-05 08:06, Kyotaro Horiguchi wrote:
| [20866] LOG: replication terminated by primary server
| [20866] DETAIL: End of WAL reached on timeline 1 at 0/30001C8.
| [20866] FATAL: could not send end-of-streaming message to primary: no COPY in progress
| [20851] LOG: reached end of WAL at 0/30001C8 on timeline 1 in archive during standby mode
| [20851] DETAIL: invalid record length at 0/30001C8: wanted 24, got 0I changed the above to the below, which looks more adequate.
| [24271] LOG: replication terminated by primary server on timeline 1 at 0/3000240.
| [24271] FATAL: could not send end-of-streaming message to primary: no COPY in progress
| [24267] LOG: reached end of WAL at 0/3000240 on timeline 1 in archive during standby mode
| [24267] DETAIL: invalid record length at 0/3000240: wanted 24, got 0
Is this the before and after? That doesn't seem like a substantial
improvement to me. You still get the "scary" message at the end.
--
Peter Eisentraut http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Mon, Mar 23, 2020 at 2:37 AM Peter Eisentraut <
peter.eisentraut@2ndquadrant.com> wrote:
On 2020-03-05 08:06, Kyotaro Horiguchi wrote:
| [20866] LOG: replication terminated by primary server
| [20866] DETAIL: End of WAL reached on timeline 1 at 0/30001C8.
| [20866] FATAL: could not send end-of-streaming message to primary: noCOPY in progress
| [20851] LOG: reached end of WAL at 0/30001C8 on timeline 1 in archive
during standby mode
| [20851] DETAIL: invalid record length at 0/30001C8: wanted 24, got 0
I changed the above to the below, which looks more adequate.
| [24271] LOG: replication terminated by primary server on timeline 1
at 0/3000240.
| [24271] FATAL: could not send end-of-streaming message to primary:
no COPY in progress
| [24267] LOG: reached end of WAL at 0/3000240 on timeline 1 in
archive during standby mode
| [24267] DETAIL: invalid record length at 0/3000240: wanted 24, got 0
Is this the before and after? That doesn't seem like a substantial
improvement to me. You still get the "scary" message at the end.
+1 I agree it still reads scary and doesn't seem improvement.
Plus, I am hoping message will improve for pg_waldump as well?
Since it reads confusing and every-time have to explain new developer it's
expected behavior which is annoying.
pg_waldump: fatal: error in WAL record at 0/1553F70: invalid record length
at 0/1553FA8: wanted 24, got 0
Hi,
On 2020-03-23 10:37:16 +0100, Peter Eisentraut wrote:
On 2020-03-05 08:06, Kyotaro Horiguchi wrote:
| [20866] LOG: replication terminated by primary server
| [20866] DETAIL: End of WAL reached on timeline 1 at 0/30001C8.
| [20866] FATAL: could not send end-of-streaming message to primary: no COPY in progress
IMO it's a bug that we see this FATAL. I seem to recall that we didn't
use to get that?
| [20851] LOG: reached end of WAL at 0/30001C8 on timeline 1 in archive during standby mode
| [20851] DETAIL: invalid record length at 0/30001C8: wanted 24, got 0I changed the above to the below, which looks more adequate.
| [24271] LOG: replication terminated by primary server on timeline 1 at 0/3000240.
| [24271] FATAL: could not send end-of-streaming message to primary: no COPY in progress
| [24267] LOG: reached end of WAL at 0/3000240 on timeline 1 in archive during standby mode
| [24267] DETAIL: invalid record length at 0/3000240: wanted 24, got 0Is this the before and after? That doesn't seem like a substantial
improvement to me. You still get the "scary" message at the end.
It seems like a minor improvement - folding the DETAIL into the message
makes sense to me here. But it indeed doesn't really address the main
issue.
I think we don't want to elide the information about how the end of the
WAL was detected - there are some issues where I found that quite
helpful. But we could reformulate it to be clearer that it's informative
output, not a bug. E.g. something roughly like
LOG: reached end of WAL at 0/3000240 on timeline 1 in archive during standby mode
DETAIL: End detected due to invalid record length at 0/3000240: expected 24, got 0
(I first elided the position in the DETAIL, but it could differ from the
one in LOG)
I don't find that very satisfying, but I can't come up with something
that provides the current information, while being less scary than my
suggestion?
Greetings,
Andres Freund
Hi,
On 2020-03-23 10:43:09 -0700, Ashwin Agrawal wrote:
Plus, I am hoping message will improve for pg_waldump as well?
Since it reads confusing and every-time have to explain new developer it's
expected behavior which is annoying.pg_waldump: fatal: error in WAL record at 0/1553F70: invalid record length
at 0/1553FA8: wanted 24, got 0
What would you like to see here? There's inherently a lot less
information about the context in waldump. We can't know whether it's to
be expected that the WAL ends at that point, or whether there was
corruption.
Greetings,
Andres Freund
At Mon, 23 Mar 2020 12:47:36 -0700, Andres Freund <andres@anarazel.de> wrote in
Hi,
On 2020-03-23 10:37:16 +0100, Peter Eisentraut wrote:
On 2020-03-05 08:06, Kyotaro Horiguchi wrote:
| [20866] LOG: replication terminated by primary server
| [20866] DETAIL: End of WAL reached on timeline 1 at 0/30001C8.
| [20866] FATAL: could not send end-of-streaming message to primary: no COPY in progressIMO it's a bug that we see this FATAL. I seem to recall that we didn't
use to get that?
I thought that it is a convention that A auxiliary process uses ERROR
(which is turned into FATAL in ereport) to exit, which I didn't like
so much, but it was out of scope of this patch.
As for the message bove, the FATAL is preceded by the "LOG:
replication terminated by" message, that means walreceiver tries to
send new data after disconnection just to fail, which is
unreasonable. I think we should exit immediately after detecting
disconnection. The FATAL is gone by the attached.
| [24267] LOG: reached end of WAL at 0/3000240 on timeline 1 in archive during standby mode
| [24267] DETAIL: invalid record length at 0/3000240: wanted 24, got 0Is this the before and after? That doesn't seem like a substantial
improvement to me. You still get the "scary" message at the end.It seems like a minor improvement - folding the DETAIL into the message
makes sense to me here. But it indeed doesn't really address the main
issue.I think we don't want to elide the information about how the end of the
WAL was detected - there are some issues where I found that quite
helpful. But we could reformulate it to be clearer that it's informative
output, not a bug. E.g. something roughly likeLOG: reached end of WAL at 0/3000240 on timeline 1 in archive during standby mode
DETAIL: End detected due to invalid record length at 0/3000240: expected 24, got 0
(I first elided the position in the DETAIL, but it could differ from the
one in LOG)I don't find that very satisfying, but I can't come up with something
that provides the current information, while being less scary than my
suggestion?
The 0-length record is not an "invalid" state during recovery, so we
can add the message for the state as "record length is 0 at %X/%X". I
think if other states found there, it implies something wrong.
LSN is redundantly shown but I'm not sure if it is better to remove it
from either of the two lines.
| LOG: reached end of WAL at 0/3000850 on timeline 1 in pg_wal during crash recovery
| DETAIL: record length is 0 at 0/3000850
regards.
--
Kyotaro Horiguchi
NTT Open Source Software Center
Attachments:
v3-0001-Make-End-Of-Recovery-error-less-scary.patchtext/x-patch; charset=us-asciiDownload+67-23
On 2020-03-24 02:52, Kyotaro Horiguchi wrote:
I don't find that very satisfying, but I can't come up with something
that provides the current information, while being less scary than my
suggestion?The 0-length record is not an "invalid" state during recovery, so we
can add the message for the state as "record length is 0 at %X/%X". I
think if other states found there, it implies something wrong.LSN is redundantly shown but I'm not sure if it is better to remove it
from either of the two lines.| LOG: reached end of WAL at 0/3000850 on timeline 1 in pg_wal during crash recovery
| DETAIL: record length is 0 at 0/3000850
I'm not up to date on all these details, but my high-level idea would be
some kind of hint associated with the existing error messages, like:
HINT: This is to be expected if this is the end of the WAL. Otherwise,
it could indicate corruption.
--
Peter Eisentraut http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Wed, Mar 25, 2020 at 8:53 AM Peter Eisentraut
<peter.eisentraut@2ndquadrant.com> wrote:
HINT: This is to be expected if this is the end of the WAL. Otherwise,
it could indicate corruption.
First, I agree that this general issue is a problem, because it's come
up for me in quite a number of customer situations. Either people get
scared when they shouldn't, because the message is innocuous, or they
don't get scared about other things that actually are scary, because
if some scary-looking messages are actually innocuous, it can lead
people to believe that the same is true in other cases.
Second, I don't really like the particular formulation you have above,
because the user still doesn't know whether or not to be scared. Can
we figure that out? I think if we're in crash recovery, I think that
we should not be scared, because we have no alternative to assuming
that we've reached the end of WAL, so all crash recoveries will end
like this. If we're in archive recovery, we should definitely be
scared if we haven't yet reached the minimum recovery point, because
more WAL than that should certainly exist. After that, it depends on
how we got the WAL. If it's being streamed, the question is whether
we've reached the end of what got streamed. If it's being copied from
the archive, we ought to have the whole segment, but maybe not more.
Can we get the right context to the point where the error is being
reported to know whether we hit the error at the end of the WAL that
was streamed? If not, can we somehow rejigger things so that we only
make it sound scary if we keep getting stuck at the same point when we
woud've expected to make progress meanwhile?
I'm just spitballing here, but it would be really good if there's a
way to know definitely whether or not you should be scared. Corrupted
WAL segments are definitely a thing that happens, but retries are a
lot more common.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
On Thu, Mar 26, 2020 at 12:41 PM Robert Haas <robertmhaas@gmail.com> wrote:
On Wed, Mar 25, 2020 at 8:53 AM Peter Eisentraut
<peter.eisentraut@2ndquadrant.com> wrote:HINT: This is to be expected if this is the end of the WAL. Otherwise,
it could indicate corruption.First, I agree that this general issue is a problem, because it's come
up for me in quite a number of customer situations. Either people get
scared when they shouldn't, because the message is innocuous, or they
don't get scared about other things that actually are scary, because
if some scary-looking messages are actually innocuous, it can lead
people to believe that the same is true in other cases.Second, I don't really like the particular formulation you have above,
because the user still doesn't know whether or not to be scared. Can
we figure that out? I think if we're in crash recovery, I think that
we should not be scared, because we have no alternative to assuming
that we've reached the end of WAL, so all crash recoveries will end
like this. If we're in archive recovery, we should definitely be
scared if we haven't yet reached the minimum recovery point, because
more WAL than that should certainly exist. After that, it depends on
how we got the WAL. If it's being streamed, the question is whether
we've reached the end of what got streamed. If it's being copied from
the archive, we ought to have the whole segment, but maybe not more.
Can we get the right context to the point where the error is being
reported to know whether we hit the error at the end of the WAL that
was streamed? If not, can we somehow rejigger things so that we only
make it sound scary if we keep getting stuck at the same point when we
woud've expected to make progress meanwhile?I'm just spitballing here, but it would be really good if there's a
way to know definitely whether or not you should be scared. Corrupted
WAL segments are definitely a thing that happens, but retries are a
lot more common.
First, I agree that getting enough context to say precisely is by far the ideal.
That being said, as an end user who's found this surprising -- and
momentarily scary every time I initially scan it even though I *know
intellectually it's not* -- I would find Peter's suggestion a
significant improvement over what we have now. I'm fairly certainly my
co-workers on our database team would also. Knowing that something is
at least not always scary is good. Though I'll grant that this does
have the negative in reverse: if it actually is a scary
situation...this mutes your concern level. On the other hand,
monitoring would tell us if there's a real problem (namely replication
lag), so I think the trade-off is clearly worth it.
How about this minor tweak:
HINT: This is expected if this is the end of currently available WAL.
Otherwise, it could indicate corruption.
Thanks,
James
Hi Kyotaro,
On 3/27/20 10:25 PM, James Coleman wrote:
On Thu, Mar 26, 2020 at 12:41 PM Robert Haas <robertmhaas@gmail.com> wrote:
I'm just spitballing here, but it would be really good if there's a
way to know definitely whether or not you should be scared. Corrupted
WAL segments are definitely a thing that happens, but retries are a
lot more common.First, I agree that getting enough context to say precisely is by far the ideal.
That being said, as an end user who's found this surprising -- and
momentarily scary every time I initially scan it even though I *know
intellectually it's not* -- I would find Peter's suggestion a
significant improvement over what we have now. I'm fairly certainly my
co-workers on our database team would also. Knowing that something is
at least not always scary is good. Though I'll grant that this does
have the negative in reverse: if it actually is a scary
situation...this mutes your concern level. On the other hand,
monitoring would tell us if there's a real problem (namely replication
lag), so I think the trade-off is clearly worth it.How about this minor tweak:
HINT: This is expected if this is the end of currently available WAL.
Otherwise, it could indicate corruption.
Any thoughts on the suggestions for making the messaging clearer?
Also, the patch no longer applies:
http://cfbot.cputube.org/patch_32_2490.log.
Marking this Waiting on Author.
Regards,
--
-David
david@pgmasters.net
At Wed, 3 Mar 2021 11:14:20 -0500, David Steele <david@pgmasters.net> wrote in
Hi Kyotaro,
On 3/27/20 10:25 PM, James Coleman wrote:
On Thu, Mar 26, 2020 at 12:41 PM Robert Haas <robertmhaas@gmail.com>
wrote:I'm just spitballing here, but it would be really good if there's a
way to know definitely whether or not you should be scared. Corrupted
WAL segments are definitely a thing that happens, but retries are a
lot more common.First, I agree that getting enough context to say precisely is by far
the ideal.
That being said, as an end user who's found this surprising -- and
momentarily scary every time I initially scan it even though I *know
intellectually it's not* -- I would find Peter's suggestion a
significant improvement over what we have now. I'm fairly certainly my
co-workers on our database team would also. Knowing that something is
at least not always scary is good. Though I'll grant that this does
have the negative in reverse: if it actually is a scary
situation...this mutes your concern level. On the other hand,
monitoring would tell us if there's a real problem (namely replication
lag), so I think the trade-off is clearly worth it.
How about this minor tweak:
HINT: This is expected if this is the end of currently available WAL.
Otherwise, it could indicate corruption.Any thoughts on the suggestions for making the messaging clearer?
Also, the patch no longer applies:
http://cfbot.cputube.org/patch_32_2490.log.
Sorry for missing the last discussions. I agree to the point about
really-scary situation.
ValidXLogRecordHeader deliberately marks End-Of-WAL only in the case
of zero-length record so that the callers can identify that case,
instead of inferring the EOW state without it. All other invalid data
is treated as potentially danger situation. I think this is a
reasonable classification. And the error level for the "danger" cases
is changed to WARNING (from LOG).
As the result, the following messages are emitted with the attached.
- found zero-length record during recovery (the DETAIL might not be needed.)
LOG: redo starts at 0/14000118
LOG: reached end of WAL at 0/14C5D070 on timeline 1 in pg_wal during crash recovery
DETAIL: record length is 0 at 0/14C5D070
LOG: redo done at 0/14C5CF48 system usage: ...
- found another kind of invalid data
LOG: redo starts at 0/150000A0
WARNING: invalid record length at 0/1500CA60: wanted 24, got 54
LOG: redo done at 0/1500CA28 system usage: ...
On the way checking the patch, I found that it emits the following log
lines in the case the redo loop meets an invalid record at the
starting:
LOG: invalid record length at 0/10000118: wanted 24, got 42
LOG: redo is not required
which doesn't look proper. That case is identifiable using the
End-of_WAL flag this patch adds. Thus we get the following error
messages.
- found end-of-wal at the beginning of recovery
LOG: reached end of WAL at 0/130000A0 on timeline 1 in pg_wal during crash recovery
DETAIL: record length is 0 at 0/130000A0
LOG: redo is not required
- found invalid data
WARNING: invalid record length at 0/120000A0: wanted 24, got 42
WARNING: redo is skipped
HINT: This suggests WAL file corruption. You might need to check the database.
The logic of ErrRecPtr in ReadRecord may wrong. I remember having
such an discussion before...
regards.
--
Kyotaro Horiguchi
NTT Open Source Software Center
Attachments:
v4-0001-Make-End-Of-Recovery-error-less-scary.patchtext/x-patch; charset=us-asciiDownload+79-28
On 3/4/21, 10:50 PM, "Kyotaro Horiguchi" <horikyota.ntt@gmail.com> wrote:
As the result, the following messages are emitted with the attached.
I'd like to voice my support for this effort, and I intend to help
review the patch. It looks like the latest patch no longer applies,
so I've marked the commitfest entry [0]https://commitfest.postgresql.org/35/2490/ as waiting-on-author.
Nathan
At Fri, 22 Oct 2021 17:54:40 +0000, "Bossart, Nathan" <bossartn@amazon.com> wrote in
On 3/4/21, 10:50 PM, "Kyotaro Horiguchi" <horikyota.ntt@gmail.com> wrote:
As the result, the following messages are emitted with the attached.
I'd like to voice my support for this effort, and I intend to help
review the patch. It looks like the latest patch no longer applies,
so I've marked the commitfest entry [0] as waiting-on-author.Nathan
Sorry for being late to reply. I rebased this to the current master.
- rebased
- use LSN_FORMAT_ARGS instead of bare shift and mask.
- v4 immediately exited walreceiver on disconnection. Maybe I wanted
not to see a FATAL message on standby after primary dies. However
that would be another issue and that change was plain wrong.. v5
just removes the "end-of-WAL" part from the message, which duplicate
to what startup emits.
- add a new error message "missing contrecord at %X/%X". Maybe this
should be regarded as a leftover of the contrecord patch. In the
attached patch the "%X/%X" is the LSN of the current record. The log
messages look like this (026_overwrite_contrecord).
LOG: redo starts at 0/1486CB8
WARNING: missing contrecord at 0/1FFC2E0
LOG: consistent recovery state reached at 0/1FFC2E0
LOG: started streaming WAL from primary at 0/2000000 on timeline 1
LOG: successfully skipped missing contrecord at 0/1FFC2E0, overwritten at 2021-11-08 14:50:11.969952+09
CONTEXT: WAL redo at 0/2000028 for XLOG/OVERWRITE_CONTRECORD: lsn 0/1FFC2E0; time 2021-11-08 14:50:11.969952+09
While checking the behavior for the case of missing-contrecord, I
noticed that emode_for_corrupt_record() doesn't work as expected since
readSource is reset to XLOG_FROM_ANY after a read failure. We could
remember the last failed source but pg_wal should have been visited if
page read error happened so I changed the function so that it treats
XLOG_FROM_ANY the same way with XLOG_FROM_PG_WAL.
(Otherwise we see "LOG: reached end-of-WAL at .." message after
"WARNING: missing contrecord at.." message.)
regards.
--
Kyotaro Horiguchi
NTT Open Source Software Center
Attachments:
v5-0001-Make-End-Of-Recovery-error-less-scary.patchtext/x-patch; charset=us-asciiDownload+87-26
On Mon, Nov 08, 2021 at 02:59:46PM +0900, Kyotaro Horiguchi wrote:
While checking the behavior for the case of missing-contrecord, I
noticed that emode_for_corrupt_record() doesn't work as expected since
readSource is reset to XLOG_FROM_ANY after a read failure. We could
remember the last failed source but pg_wal should have been visited if
page read error happened so I changed the function so that it treats
XLOG_FROM_ANY the same way with XLOG_FROM_PG_WAL.
FWIW, I am not much a fan of assuming that it is fine to use
XLOG_FROM_ANY as a condition here. The comments on top of
emode_for_corrupt_record() make it rather clear what the expectations
are, and this is the default readSource.
(Otherwise we see "LOG: reached end-of-WAL at .." message after
"WARNING: missing contrecord at.." message.)
+ /* broken record found */
+ ereport(WARNING,
+ (errmsg("redo is skipped"),
+ errhint("This suggests WAL file corruption. You might need to check the database.")));
This looks rather scary to me, FWIW, and this could easily be reached
if one forgets about EndOfWAL while hacking on xlogreader.c.
Unlikely so, still.
+ report_invalid_record(state,
+ "missing contrecord at %X/%X",
+ LSN_FORMAT_ARGS(RecPtr));
Isn't there a risk here to break applications checking after error
messages stored in the WAL reader after seeing a contrecord?
+ if (record->xl_tot_len == 0)
+ {
+ /* This is strictly not an invalid state, so phrase it as so. */
+ report_invalid_record(state,
+ "record length is 0 at %X/%X",
+ LSN_FORMAT_ARGS(RecPtr));
+ state->EndOfWAL = true;
+ return false;
+ }
This assumes that a value of 0 for xl_tot_len is a synonym of the end
of WAL, but cannot we have also a corrupted record in this case in the
shape of xl_tot_len being 0? We validate the full record after
reading the header, so it seems to me that we should not assume that
things are just ending as proposed in this patch.
--
Michael
Thank you for the comments!
At Tue, 9 Nov 2021 09:53:15 +0900, Michael Paquier <michael@paquier.xyz> wrote in
On Mon, Nov 08, 2021 at 02:59:46PM +0900, Kyotaro Horiguchi wrote:
While checking the behavior for the case of missing-contrecord, I
noticed that emode_for_corrupt_record() doesn't work as expected since
readSource is reset to XLOG_FROM_ANY after a read failure. We could
remember the last failed source but pg_wal should have been visited if
page read error happened so I changed the function so that it treats
XLOG_FROM_ANY the same way with XLOG_FROM_PG_WAL.FWIW, I am not much a fan of assuming that it is fine to use
XLOG_FROM_ANY as a condition here. The comments on top of
emode_for_corrupt_record() make it rather clear what the expectations
are, and this is the default readSource.
The readSource is expected by the function to be the failed source but
it goes back to XLOG_FROM_ANY on page read failure. So the function
*is* standing on the wrong assumption. I noticed that currentSource
holds the last accessed source (but forgot about that). So it is
exactly what we need here. No longer need to introduce the unclear
assumption by using it.
(Otherwise we see "LOG: reached end-of-WAL at .." message after
"WARNING: missing contrecord at.." message.)+ /* broken record found */ + ereport(WARNING, + (errmsg("redo is skipped"), + errhint("This suggests WAL file corruption. You might need to check the database."))); This looks rather scary to me, FWIW, and this could easily be reached
Yes, the message is intentionally scary, since we don't come here in
the case of clean WAL:)
if one forgets about EndOfWAL while hacking on xlogreader.c.
Unlikely so, still.
I don't understand. Isn't it the case of almost every feature?
The patch compells hackers to maintain the condition for recovery
being considered cleanly ended. If the last record doesn't meet the
condition, the WAL file should be considered having a
problem. However, I don't see the condition expanded to have another
term in future.
Even if someone including myself broke that condition, we will at
worst unwantedly see a scary message. And I believe almost all
hackers can easily find it a bug from the DETAILED message shown along
aside. I'm not sure such bugs could be found in development phase,
though..
+ report_invalid_record(state, + "missing contrecord at %X/%X", + LSN_FORMAT_ARGS(RecPtr)); Isn't there a risk here to break applications checking after error messages stored in the WAL reader after seeing a contrecord?
I'm not sure you are mentioning the case where no message is stored
previously, or the case where already a message is stored. The former
is fine as the record is actually broken. But I was missing the latter
case. In this version I avoided to overwite the error message.
+ if (record->xl_tot_len == 0) + { + /* This is strictly not an invalid state, so phrase it as so. */ + report_invalid_record(state, + "record length is 0 at %X/%X", + LSN_FORMAT_ARGS(RecPtr)); + state->EndOfWAL = true; + return false; + } This assumes that a value of 0 for xl_tot_len is a synonym of the end of WAL, but cannot we have also a corrupted record in this case in the shape of xl_tot_len being 0? We validate the full record after reading the header, so it seems to me that we should not assume that things are just ending as proposed in this patch.
Yeah, it's the most serious concern to me. So I didn't hide the
detailed message in the "end-of-wal reached message".
LOG: reached end of WAL at 0/512F198 on timeline 1 in pg_wal during crash recovery
DETAIL: record length is 0 at 0/512F210
I believe everyone regards zero record length as fine unless something
wrong is seen afterwards. However, we can extend the check to the
whole record header. I think it is by far nearer to the perfect for
almost all cases. The attached emits the following message for the
good (true end-of-WAL) case.
LOG: reached end of WAL at 0/512F4A0 on timeline 1 in pg_wal during crash recovery
DETAIL: empty record header found at 0/512F518
If garbage bytes are found in the header area, the following log will
be left. I think we can have a better message here.
WARNING: garbage record header at 0/2095458
LOG: redo done at 0/2095430 system usage: CPU: user: 0.03 s, system: 0.01 s, elapsed: 0.04 s
This is the updated version.
- emode_for_currupt_record() now uses currentSource instead of
readSource.
- If zero record length is faced, make sure the whole header is zeroed
before deciding it as the end-of-WAL.
- Do not overwrite existig message when missing contrecord is
detected. The message added here is seen in the TAP test log
026_overwrite_contrecord_standby.log
regards.
--
Kyotaro Horiguchi
NTT Open Source Software Center
Attachments:
v6-0001-Make-End-Of-Recovery-error-less-scary.patchtext/x-patch; charset=us-asciiDownload+111-25
At Tue, 09 Nov 2021 16:27:51 +0900 (JST), Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote in
This is the updated version.
- emode_for_currupt_record() now uses currentSource instead of
readSource.- If zero record length is faced, make sure the whole header is zeroed
before deciding it as the end-of-WAL.- Do not overwrite existig message when missing contrecord is
detected. The message added here is seen in the TAP test log
026_overwrite_contrecord_standby.log
d2ddfa681db27a138acb63c8defa8cc6fa588922 removed global variables
ReadRecPtr and EndRecPtr. This is rebased version that reads the LSNs
directly from xlogreader instead of the removed global variables.
regards.
--
Kyotaro Horiguchi
NTT Open Source Software Center
Attachments:
v7-0001-Make-End-Of-Recovery-error-less-scary.patchtext/x-patch; charset=us-asciiDownload+112-24
d2ddfa681db27a138acb63c8defa8cc6fa588922 removed global variables
ReadRecPtr and EndRecPtr. This is rebased version that reads the LSNs
directly from xlogreader instead of the removed global variables.
Hi, hackers!
I've checked the latest version of a patch. It applies cleanly, check-world
passes and CI is also in the green state.
Proposed messages seem good to me, but probably it would be better to have
a test on conditions where "reached end of WAL..." emitted.
Then, I believe it can be set as 'ready for committter'.
--
Best regards,
Pavel Borisov
Postgres Professional: http://postgrespro.com <http://www.postgrespro.com>