Re: Database corruption?

Started by Mikheev, Vadimabout 24 years ago1 messages
#1Mikheev, Vadim
vmikheev@SECTORBASE.COM

Um, Vadim? Still of the opinion that elog(STOP) is a good
idea here? That's two people now for whom that decision has
turned localized corruption into complete database failure.
I don't think it's a good tradeoff.

One is able to use pg_resetxlog so I don't see point in
removing elog(STOP) there. What do you think?

Well, pg_resetxlog would get around the symptom, but at the cost of
possibly losing updates that are further along in the xlog than the
update for the corrupted page. (I'm assuming that the problem here
is a page with a corrupt LSN.) I think it's better to treat flush

^^^^^^^^^^^^^^^^^^^^^^^^^^^^
On restart, entire content of all modified after last checkpoint pages
should be restored from WAL. In Denis case it looks like newly allocated
for update page was somehow corrupted before heapam.c:2235 (7.1.2 src)
and so there was no XLOG_HEAP_INIT_PAGE flag in WAL record => page
content was not initialized on restart. Denis reported system crash -
very likely due to memory problem.

request past end of log as a DEBUG or NOTICE condition and keep going.
Sure, it indicates badness somewhere, but we should try to have some
robustness in the face of that badness. I do not see any reason why
XLOG has to declare defeat and go home because of this condition.

Ok - what about setting some flag there on restart and abort restart
after all records from WAL applied? So DBA will have choice either
to run pg_resetxlog after that and try to dump data or restore from
old backup. I still object just NOTICE there - easy to miss it. And
in normal processing mode I'd leave elog(STOP) there.

Vadim
P.S. Further discussions will be in hackers-list, sorry.