Automatically force zero_damaged_pages while InRecovery?
We have seen a couple instances recently of WAL recovery failing due to
the recently added code that validates a page header as soon as the page
is read in, for example Olivier Prenant's crash report here:
http://archives.postgresql.org/pgsql-hackers/2003-10/msg01505.php
This failure is actually entirely pointless, because (AFAIK) any page
that is brought in during WAL recovery is going to be overwritten in
toto from the WAL log. So it would be safe to run WAL recovery with
zero_damaged_pages enabled. Rather than expecting DBAs to think of that
under the stress of a crashed-database situation, I propose that we do
it for them:
*** src/backend/storage/buffer/bufmgr.c.orig Fri Nov 21 12:41:31 2003
--- src/backend/storage/buffer/bufmgr.c Sat Nov 29 13:35:14 2003
***************
*** 231,237 ****
if (status == SM_SUCCESS &&
!PageHeaderIsValid((PageHeader) MAKE_PTR(bufHdr->data)))
{
! if (zero_damaged_pages)
{
ereport(WARNING,
(errcode(ERRCODE_DATA_CORRUPTED),
--- 231,237 ----
if (status == SM_SUCCESS &&
!PageHeaderIsValid((PageHeader) MAKE_PTR(bufHdr->data)))
{
! if (zero_damaged_pages || InRecovery)
{
ereport(WARNING,
(errcode(ERRCODE_DATA_CORRUPTED),
Comments?
regards, tom lane
Tom Lane wrote:
This failure is actually entirely pointless, because (AFAIK) any page
that is brought in during WAL recovery is going to be overwritten in
toto from the WAL log. So it would be safe to run WAL recovery with
zero_damaged_pages enabled. Rather than expecting DBAs to think of that
under the stress of a crashed-database situation, I propose that we do
it for them:
Sounds like a good idea to me.
Joe
On Sat, Nov 29, 2003 at 11:31:03AM -0800, Joe Conway wrote:
Tom Lane wrote:
This failure is actually entirely pointless, because (AFAIK) any page
that is brought in during WAL recovery is going to be overwritten in
toto from the WAL log. So it would be safe to run WAL recovery with
zero_damaged_pages enabled. Rather than expecting DBAs to think of that
under the stress of a crashed-database situation, I propose that we do
it for them:Sounds like a good idea to me.
Yes, but is the error message going to stay verbatim? Maybe the
category could be decreased ...
--
Alvaro Herrera (<alvherre[a]dcc.uchile.cl>)
"Everybody understands Mickey Mouse. Few understand Hermann Hesse.
Hardly anybody understands Einstein. And nobody understands Emperor Norton."
Alvaro Herrera <alvherre@dcc.uchile.cl> writes:
Yes, but is the error message going to stay verbatim? Maybe the
category could be decreased ...
The patch I proposed would still cause a WARNING to come out in the
postmaster log. We could perhaps reduce it to a NOTICE if InRecovery
is true, or have InRecovery add a DETAIL pointing out that this should
be no problem (though the latter behavior couldn't be back-ported to
7.3 very readily). We'd not want to change the message wording in the
7.4 branch either.
I think this is all a bit academic though, compared to the basic point
of not failing ;-). In practice, any recovery from WAL is going to
produce a bunch of scary-looking messages in the postmaster log.
We're not likely to remove those messages, since they provide
potentially critical debugging info if it fails to work. I'm not that
concerned about making them less scary-looking ...
regards, tom lane