Missing pg_control crashes postmaster
Hey Hackers,
If a postmaster is running and the pg_control file is removed postgres
will PANIC.
Steps to recreate:
1.) start a new cluster
2.) rm $DATADIR/pg_control
3.) psql => CHECKPOINT;
PANIC: could not open control file "global/pg_control": No such file
or directory
After the PANIC there is no pg_control. Recovery would be difficult
without a replica or a backup. Instead of crashing we can just write a
new pg_control file since all the data is in memory at the time.
There does not really seem to be a need for this behavior as all the
information postgres needs is in memory at this point. I propose with
a patch to just recreate pg_control on updates if it does not exist.
--
Brian Faherty
Attachments:
0001-Create-pg_control-on-update-if-not-exists.patchtext/x-patch; charset=US-ASCII; name=0001-Create-pg_control-on-update-if-not-exists.patchDownload+1-2
On Mon, Jul 23, 2018 at 12:31 PM, Brian Faherty <
anothergenericuser@gmail.com> wrote:
There does not really seem to be a need for this behavior as all the
information postgres needs is in memory at this point. I propose with
a patch to just recreate pg_control on updates if it does not exist.
Or at minimum create said file with a different name in PGDATA so an admin
can rename it should they wish to accept the in memory version as being a
valid replacement for whatever ended up happening to the original.
Even if it can be safely rebuilt having pg_control removed out from under a
running server seems like something that shouldn't happen and the server is
in its rights to panic if it does.
David J.
On July 23, 2018 12:31:13 PM PDT, Brian Faherty <anothergenericuser@gmail.com> wrote:
Hey Hackers,
If a postmaster is running and the pg_control file is removed postgres
will PANIC.Steps to recreate:
1.) start a new cluster
2.) rm $DATADIR/pg_control
3.) psql => CHECKPOINT;PANIC: could not open control file "global/pg_control": No such file
or directoryAfter the PANIC there is no pg_control. Recovery would be difficult
without a replica or a backup. Instead of crashing we can just write a
new pg_control file since all the data is in memory at the time.There does not really seem to be a need for this behavior as all the
information postgres needs is in memory at this point. I propose with
a patch to just recreate pg_control on updates if it does not exist.
What's the issue this would solve? Given that there's moments, until the control file is rewritten, where you would be toast either way, I don't buy this gives much added safety. Nor have you explained which realistic scenarios lead to the file missing, without much broader problems being present.
Andres
--
Sent from my Android device with K-9 Mail. Please excuse my brevity.
Brian Faherty <anothergenericuser@gmail.com> writes:
If a postmaster is running and the pg_control file is removed postgres
will PANIC.
That's very intentional. Don't do it.
There does not really seem to be a need for this behavior as all the
information postgres needs is in memory at this point. I propose with
a patch to just recreate pg_control on updates if it does not exist.
I would vote to reject any such patch; it's too likely to cause more
problems than it solves. Generally, if critical files like that one
have disappeared, trying to write new data isn't going to be enough
to fix it and could well result in more corruption.
As an example, imagine that you do "rm -rf $PGDATA; initdb" without
remembering to shut down the old postmaster first. Currently, the
old postmaster will panic/quit fairly promptly and no harm done.
The more aggressive it is at trying to "recover" from the situation,
the more likely it is to corrupt the new installation.
(Note that you would have to break a few other things in order to
make this particular scenario actually hazardous. My point is just
that there *are* reasons not to try to recover automatically.)
regards, tom lane
On Mon, Jul 23, 2018 at 07:00:30PM -0400, Tom Lane wrote:
I would vote to reject any such patch; it's too likely to cause more
problems than it solves. Generally, if critical files like that one
have disappeared, trying to write new data isn't going to be enough
to fix it and could well result in more corruption.
+1.
--
Michael
On 24 July 2018 at 03:31, Brian Faherty <anothergenericuser@gmail.com>
wrote:
Hey Hackers,
If a postmaster is running and the pg_control file is removed postgres
will PANIC.
How does that happen?
"Don't go deleting stuff in pgdata" is pretty fundamental.
On 7/23/18 7:00 PM, Tom Lane wrote:
Brian Faherty <anothergenericuser@gmail.com> writes:
There does not really seem to be a need for this behavior as all the
information postgres needs is in memory at this point. I propose with
a patch to just recreate pg_control on updates if it does not exist.I would vote to reject any such patch; it's too likely to cause more
problems than it solves. Generally, if critical files like that one
have disappeared, trying to write new data isn't going to be enough
to fix it and could well result in more corruption.As an example, imagine that you do "rm -rf $PGDATA; initdb" without
remembering to shut down the old postmaster first. Currently, the
old postmaster will panic/quit fairly promptly and no harm done.
The more aggressive it is at trying to "recover" from the situation,
the more likely it is to corrupt the new installation.
It seems much more likely that a missing/modified postmaster.pid will
cause postgres to panic than it is for a missing pg_control to do so.
Older versions of postgres don't panic until the next checkpoint and
newer versions won't panic at all on an idle system since we fixed
redundant checkpoints in 9.6 (6ef2eba3). An idle postgres 11 cluster
seems happy enough to run without a pg_control file indefinitely (or at
least 10 minutes, which is past the default checkpoint time). As soon
as I write data or perform a checkpoint it does panic, of course.
Conversely, removing/modifying postmaster.pid causes postgres to panic
very quickly on the versions I tested, 9.4 and 11.
It seems to me that doing the postmaster.pid test at checkpoint time (if
we don't already) would be enough to protect pg_control against
unintentionally replaced clusters.
Or perhaps writing to an alternate file as David J suggests would do the
trick.
It seems like an easy win if we can find a safe way to do it, though I
admit that this is only a benefit in corner cases.
Regards,
--
-David
david@pgmasters.net
On July 25, 2018 7:18:30 AM PDT, David Steele <david@pgmasters.net> wrote:
On 7/23/18 7:00 PM, Tom Lane wrote:
Brian Faherty <anothergenericuser@gmail.com> writes:
There does not really seem to be a need for this behavior as all the
information postgres needs is in memory at this point. I proposewith
a patch to just recreate pg_control on updates if it does not exist.
I would vote to reject any such patch; it's too likely to cause more
problems than it solves. Generally, if critical files like that one
have disappeared, trying to write new data isn't going to be enough
to fix it and could well result in more corruption.As an example, imagine that you do "rm -rf $PGDATA; initdb" without
remembering to shut down the old postmaster first. Currently, the
old postmaster will panic/quit fairly promptly and no harm done.
The more aggressive it is at trying to "recover" from the situation,
the more likely it is to corrupt the new installation.It seems much more likely that a missing/modified postmaster.pid will
cause postgres to panic than it is for a missing pg_control to do so.Older versions of postgres don't panic until the next checkpoint and
newer versions won't panic at all on an idle system since we fixed
redundant checkpoints in 9.6 (6ef2eba3). An idle postgres 11 cluster
seems happy enough to run without a pg_control file indefinitely (or atleast 10 minutes, which is past the default checkpoint time). As soon
as I write data or perform a checkpoint it does panic, of course.Conversely, removing/modifying postmaster.pid causes postgres to panic
very quickly on the versions I tested, 9.4 and 11.It seems to me that doing the postmaster.pid test at checkpoint time
(if
we don't already) would be enough to protect pg_control against
unintentionally replaced clusters.Or perhaps writing to an alternate file as David J suggests would do
the
trick.It seems like an easy win if we can find a safe way to do it, though I
admit that this is only a benefit in corner cases.
What would we win here? Which scenario that's not contrived would be less bad due to the proposed change. This seems complexity for it's own sake.
Andres
--
Sent from my Android device with K-9 Mail. Please excuse my brevity.
On 7/25/18 10:37 AM, Andres Freund wrote:
On July 25, 2018 7:18:30 AM PDT, David Steele <david@pgmasters.net> wrote:
It seems like an easy win if we can find a safe way to do it, though I
admit that this is only a benefit in corner cases.What would we win here? Which scenario that's not contrived would be less bad due to the proposed change. This seems complexity for it's own sake.
I think it's worth preserving pg_control even in the case where there is
other damage to the cluster. The alternative in this case (if no backup
exists) is to run pg_resetwal which means data since the last checkpoint
will not be written out causing even more data loss. I have run
clusters with checkpoint_timeout = 60m so data loss in this case is a
real concern.
I favor the contrived scenario that helps preserve the current cluster
instead of a hypothetical newly init'd one. I also don't think that
users deleting files out of a cluster is all that contrived.
Adding O_CREATE to open() doesn't seem too complex to me. I'm not
really in favor of the renaming idea, but I'm not against it either if
it gets me a copy of the pg_control file.
Regards,
--
-David
david@pgmasters.net
Hi,
On 2018-07-25 10:52:08 -0400, David Steele wrote:
On 7/25/18 10:37 AM, Andres Freund wrote:
On July 25, 2018 7:18:30 AM PDT, David Steele <david@pgmasters.net> wrote:
It seems like an easy win if we can find a safe way to do it, though I
admit that this is only a benefit in corner cases.What would we win here? Which scenario that's not contrived would be less bad due to the proposed change. This seems complexity for it's own sake.
I think it's worth preserving pg_control even in the case where there is
other damage to the cluster. The alternative in this case (if no backup
exists) is to run pg_resetwal which means data since the last checkpoint
will not be written out causing even more data loss. I have run clusters
with checkpoint_timeout = 60m so data loss in this case is a real concern.
Wait, what? How is "data loss in this case is a real concern." - no
even a remotely realistic scenario has been described where this matters
so far.
I favor the contrived scenario that helps preserve the current cluster
instead of a hypothetical newly init'd one. I also don't think that users
deleting files out of a cluster is all that contrived.
But trying to limp on in that case, and that being helpful, is.
Adding O_CREATE to open() doesn't seem too complex to me. I'm not really in
favor of the renaming idea, but I'm not against it either if it gets me a
copy of the pg_control file.
The problem is that that'll just hide the issue for a bit longer, while
continuing (due to the O_CREAT we'll not PANIC anymore). Which can lead
to a lot of followup issues, like checkpoints removing old WAL that'd
have been useful for data recovery.
Greetings,
Andres Freund
On 7/25/18 11:09 AM, Andres Freund wrote:
On 2018-07-25 10:52:08 -0400, David Steele wrote:
I favor the contrived scenario that helps preserve the current cluster
instead of a hypothetical newly init'd one. I also don't think that users
deleting files out of a cluster is all that contrived.But trying to limp on in that case, and that being helpful, is.
OK, I can't argue with that. It would be wrong to continue operating
without knowing what the damage is.
Adding O_CREATE to open() doesn't seem too complex to me. I'm not really in
favor of the renaming idea, but I'm not against it either if it gets me a
copy of the pg_control file.The problem is that that'll just hide the issue for a bit longer, while
continuing (due to the O_CREAT we'll not PANIC anymore). Which can lead
to a lot of followup issues, like checkpoints removing old WAL that'd
have been useful for data recovery.
So if a panic is the best thing to do, it might still be good to write
out a copy of pg_control to another file and let the user know that it's
there. More information seems better than less to me.
Regards,
--
-David
david@pgmasters.net
David Steele <david@pgmasters.net> writes:
On 7/25/18 10:37 AM, Andres Freund wrote:
What would we win here? Which scenario that's not contrived would be less bad due to the proposed change. This seems complexity for it's own sake.
I favor the contrived scenario that helps preserve the current cluster
instead of a hypothetical newly init'd one. I also don't think that
users deleting files out of a cluster is all that contrived.
What's not contrived about it? Particularly the case of *only* deleting
pg_control and not any other critical file? I don't recall having heard
any such stories in the last twenty years.
Also, even if we added the complexity needed to write data into say
pg_control.bak, what then? You think that that would be any less
prone to indiscriminate rm'ing? Where and how would we document this,
and what's the odds that someone dumb enough to remove pg_control would
ever have read that part of the documentation?
I'm with Andres: this is a solution in search of a problem.
regards, tom lane
Hi,
On 2018-07-25 11:19:49 -0400, David Steele wrote:
On 7/25/18 11:09 AM, Andres Freund wrote:
On 2018-07-25 10:52:08 -0400, David Steele wrote:
The problem is that that'll just hide the issue for a bit longer, while
continuing (due to the O_CREAT we'll not PANIC anymore). Which can lead
to a lot of followup issues, like checkpoints removing old WAL that'd
have been useful for data recovery.So if a panic is the best thing to do, it might still be good to write out a
copy of pg_control to another file and let the user know that it's there.
More information seems better than less to me.
Sure, if that were the only concern. But code complexity and testability
also is. This means we need to maintain code for an edge case that
nobody has come up with a reason for. And at least during development,
we need to come up with test cases. Or we continually run the test case
as part of the regression tests for something without any sort of
practical relevance.
If one wanted to improve recoverability in scenarios like this, there'd
be actually useful things like adding the option to extract control
files, FPIs, clog contents from the WAL with pg_waldump.
Greetings,
Andres Freund
David Steele <david@pgmasters.net> writes:
On 7/25/18 11:09 AM, Andres Freund wrote:
The problem is that that'll just hide the issue for a bit longer, while
continuing (due to the O_CREAT we'll not PANIC anymore). Which can lead
to a lot of followup issues, like checkpoints removing old WAL that'd
have been useful for data recovery.
So if a panic is the best thing to do, it might still be good to write
out a copy of pg_control to another file and let the user know that it's
there. More information seems better than less to me.
I'm still dubious that this is fixing any real-world problem that is
more pressing than the problems it would create. If you're asked to
resuscitate a dead cluster, do you trust pg_control.bak if you find
it? Maybe it's horribly out of date (consider likelihood that someone
removed pg_control more than once, having got away with that the first
time). If there's both that and pg_control, which do you trust?
regards, tom lane
On 7/25/18 11:26 AM, Andres Freund wrote:
On 2018-07-25 11:19:49 -0400, David Steele wrote:
If one wanted to improve recoverability in scenarios like this, there'd
be actually useful things like adding the option to extract control
files, FPIs, clog contents from the WAL with pg_waldump.
I think you'd need to be a bit careful about what to trust in the WAL
but presumably the last checkpoint LSN would be safe enough.
At the end of the day good backups are the best protection against an
issue like this.
Regards,
--
-David
david@pgmasters.net
On 7/25/18 11:33 AM, Tom Lane wrote:
David Steele <david@pgmasters.net> writes:
On 7/25/18 11:09 AM, Andres Freund wrote:
The problem is that that'll just hide the issue for a bit longer, while
continuing (due to the O_CREAT we'll not PANIC anymore). Which can lead
to a lot of followup issues, like checkpoints removing old WAL that'd
have been useful for data recovery.So if a panic is the best thing to do, it might still be good to write
out a copy of pg_control to another file and let the user know that it's
there. More information seems better than less to me.I'm still dubious that this is fixing any real-world problem that is
more pressing than the problems it would create. If you're asked to
resuscitate a dead cluster, do you trust pg_control.bak if you find
it? Maybe it's horribly out of date (consider likelihood that someone
removed pg_control more than once, having got away with that the first
time). If there's both that and pg_control, which do you trust?
It would need to be a manual operation. I don't think automating it
would be a good idea for the reasons that Andres has enumerated.
Perhaps making pg_resetwal a bit smarter in these scenarios would be the
way to go. It's already the tool of last resort so this kind of
manipulation might be a better fit there.
Regards,
--
-David
david@pgmasters.net