when the startup process doesn't
Hi,
I've noticed that customers not infrequently complain that they start
postgres and then the system doesn't come up for a while and they have
no idea what's going on and are (understandably) worried. There are
probably a number of reasons why this can happen, but the ones that
seem to come up most often in my experience are (1) SyncDataDirectory
takes a long time, (b) ResetUnloggedRelations takes a long time, and
(c) there's a lot of WAL to apply so that takes a long time. It's
possible to distinguish this last case from the other two by looking
at the output of 'ps', but that's not super-convenient if your normal
method of access to the server is via libpq, and it only works if you
are monitoring it as it's happening rather than looking at the logs
after-the-fact. I am not sure there's any real way to distinguish the
other two cases without using strace or gdb or similar.
It seems to me that we could do better. One approach would be to try
to issue a log message periodically - maybe once per minute, or some
configurable interval, e.g. perhaps add messages something like this:
LOG: still syncing data directory, elapsed time %ld.%03d ms, current path %s
LOG: data directory sync complete after %ld.%03d ms
LOG: still resetting unlogged relations, elapsed time %ld.%03d ms,
current path %s
LOG: unlogged relations reset after %ld.%03d ms
LOG: still performing crash recovery, elapsed time %ld.%03d ms,
current LSN %08X/%08X
We already have a message when redo is complete, so there's no need
for another one. The implementation here doesn't seem too hard either:
the startup process would set a timer, when the timer expires the
signal handler sets a flag, at a convenient point we notice the flag
is set and responding by printing a message and clearing the flag.
Another possible approach would be to accept connections for
monitoring purposes even during crash recovery. We can't allow access
to any database at that point, since the system might not be
consistent, but we could allow something like a replication connection
(the non-database-associated variant). Maybe it would be precisely a
replication connection and we'd just refuse all but a subset of
commands, or maybe it would be some other kinds of thing. But either
way you'd be able to issue a command in some mini-language saying "so,
tell me how startup is going" and it would reply with a result set of
some kind.
If I had to pick one of these two ideas, I'd pick the one the
log-based solution, since it seems easier to access and simplifies
retrospective analysis, but I suspect SQL access would be quite useful
for some users too, especially in cloud environments where "just log
into the machine and have a look" is not an option.
Thoughts?
--
Robert Haas
EDB: http://www.enterprisedb.com
On 2021-Apr-19, Robert Haas wrote:
Another possible approach would be to accept connections for
monitoring purposes even during crash recovery. We can't allow access
to any database at that point, since the system might not be
consistent, but we could allow something like a replication connection
(the non-database-associated variant).
Hmm. We already have pg_isready, which is pretty simplistic -- it tries
to connect to the server and derive a status in a very simplistic way.
Can we perhaps improve on that? I think your idea of using the
non-database-connected replication mode would let the server return a
tuple with some status information with a new command. And then
pg_isready could interpret that, or just print it.
--
�lvaro Herrera 39�49'30"S 73�17'W
Subversion to GIT: the shortest path to happiness I've ever heard of
(Alexey Klyukin)
On Mon, Apr 19, 2021 at 01:55:13PM -0400, Robert Haas wrote:
I've noticed that customers not infrequently complain that they start
postgres and then the system doesn't come up for a while and they have
no idea what's going on and are (understandably) worried. There are
probably a number of reasons why this can happen, but the ones that
seem to come up most often in my experience are (1) SyncDataDirectory
takes a long time, (b) ResetUnloggedRelations takes a long time, and
(c) there's a lot of WAL to apply so that takes a long time. It's
possible to distinguish this last case from the other two by looking
at the output of 'ps', but that's not super-convenient if your normal
method of access to the server is via libpq, and it only works if you
are monitoring it as it's happening rather than looking at the logs
after-the-fact. I am not sure there's any real way to distinguish the
other two cases without using strace or gdb or similar.It seems to me that we could do better. One approach would be to try
to issue a log message periodically - maybe once per minute, or some
configurable interval, e.g. perhaps add messages something like this:
Yes, this certainly needs improvement.
--
Bruce Momjian <bruce@momjian.us> https://momjian.us
EDB https://enterprisedb.com
If only the physical world exists, free will is an illusion.
On Tue, Apr 20, 2021 at 5:55 AM Robert Haas <robertmhaas@gmail.com> wrote:
If I had to pick one of these two ideas, I'd pick the one the
log-based solution, since it seems easier to access and simplifies
retrospective analysis, but I suspect SQL access would be quite useful
for some users too, especially in cloud environments where "just log
into the machine and have a look" is not an option.
+1 for both ideas. I've heard multiple requests for something like
that. A couple of users with update_process_title=off told me they
regretted that choice when they found themselves running a long crash
recovery with the only indicator of progress disabled.
Thomas Munro <thomas.munro@gmail.com> writes:
On Tue, Apr 20, 2021 at 5:55 AM Robert Haas <robertmhaas@gmail.com> wrote:
If I had to pick one of these two ideas, I'd pick the one the
log-based solution, since it seems easier to access and simplifies
retrospective analysis, but I suspect SQL access would be quite useful
for some users too, especially in cloud environments where "just log
into the machine and have a look" is not an option.
+1 for both ideas. I've heard multiple requests for something like
that. A couple of users with update_process_title=off told me they
regretted that choice when they found themselves running a long crash
recovery with the only indicator of progress disabled.
Hmm ... +1 for progress messages in the log, but I'm suspicious about
the complexity-and-fragility-versus-value tradeoff for the other thing.
regards, tom lane
On Mon, Apr 19, 2021 at 7:55 PM Robert Haas <robertmhaas@gmail.com> wrote:
Hi,
I've noticed that customers not infrequently complain that they start
postgres and then the system doesn't come up for a while and they have
no idea what's going on and are (understandably) worried. There are
probably a number of reasons why this can happen, but the ones that
seem to come up most often in my experience are (1) SyncDataDirectory
takes a long time, (b) ResetUnloggedRelations takes a long time, and
(c) there's a lot of WAL to apply so that takes a long time. It's
possible to distinguish this last case from the other two by looking
at the output of 'ps', but that's not super-convenient if your normal
method of access to the server is via libpq, and it only works if you
are monitoring it as it's happening rather than looking at the logs
after-the-fact. I am not sure there's any real way to distinguish the
other two cases without using strace or gdb or similar.It seems to me that we could do better. One approach would be to try
to issue a log message periodically - maybe once per minute, or some
configurable interval, e.g. perhaps add messages something like this:LOG: still syncing data directory, elapsed time %ld.%03d ms, current path %s
LOG: data directory sync complete after %ld.%03d ms
LOG: still resetting unlogged relations, elapsed time %ld.%03d ms,
current path %s
LOG: unlogged relations reset after %ld.%03d ms
LOG: still performing crash recovery, elapsed time %ld.%03d ms,
current LSN %08X/%08XWe already have a message when redo is complete, so there's no need
for another one. The implementation here doesn't seem too hard either:
the startup process would set a timer, when the timer expires the
signal handler sets a flag, at a convenient point we notice the flag
is set and responding by printing a message and clearing the flag.Another possible approach would be to accept connections for
monitoring purposes even during crash recovery. We can't allow access
to any database at that point, since the system might not be
consistent, but we could allow something like a replication connection
(the non-database-associated variant). Maybe it would be precisely a
replication connection and we'd just refuse all but a subset of
commands, or maybe it would be some other kinds of thing. But either
way you'd be able to issue a command in some mini-language saying "so,
tell me how startup is going" and it would reply with a result set of
some kind.If I had to pick one of these two ideas, I'd pick the one the
log-based solution, since it seems easier to access and simplifies
retrospective analysis, but I suspect SQL access would be quite useful
for some users too, especially in cloud environments where "just log
into the machine and have a look" is not an option.Thoughts?
(Ugh. Did reply instead of reply-all. Surely I should know that by
now... Here's a re-send!)
+1 for the log based one.
In general I'm usually against the log based one, but something over
the replication protocol is really not going to help a lot of people
who are in this situation. They may not even have permissions to log
in, and any kind of monitoring system would fail to work as well. And
can we even log users in at this point? We can't get the list of
roles... If we could, I would say it's probably better to allow the
login in a regular connection, but then immediately throw an error and
give this error a more detailed message if the user has monitoring
permissions.
But against either of those, the log based method is certainly a lot
easier to build :)
And FWIW, I believe most -- probably all -- cloud environments do give
an interface to view the log at least, so the log based solution would
work there as well. Maybe not as convenient, but it would work.
--
Magnus Hagander
Me: https://www.hagander.net/
Work: https://www.redpill-linpro.com/
On Mon, Apr 19, 2021 at 8:44 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Hmm ... +1 for progress messages in the log, but I'm suspicious about
the complexity-and-fragility-versus-value tradeoff for the other thing.
All right, so it's nice to hear that nobody so far is opposed to the
log-based solution, and I think it's sensible to think about building
that one first and doing anything else later.
But, if we did want to invent something to allow monitoring via libpq
even at this early stage, how would we make it work? Magnus pointed
out that we can hardly read pg_authid during crash recovery, which
means that accepting logins in the usual sense at that stage is not
feasible. But, what if we picked a fixed, hard-coded role name for
this? I would suggest pg_monitor, but that's already taken for
something else, so maybe pg_console or some better thing someone else
can suggest. Without a pg_authid row, you couldn't use password, md5,
or scram authentication, unless we provided some other place to store
the verifier, like a flatfile. I'm not sure we want to go there, but
that still leaves a lot of workable authentication methods.
I think Álvaro is right to see this kind of work as an extension of
pg_isready, but the problem with pg_isready is that we don't want to
expose a lot of information to the whole Internet, or however much of
it can reach the postgres port. But with this approach, you can lock
down access via pg_hba.conf, which means that it's OK to expose
information that we don't want to make available to everyone. I think
we're still limited to exposing what can be observed from shared
memory here, because the whole idea is to have something that can be
used even before consistency is reached, so we shouldn't really be
doing anything that would look at the contents of data files. But that
still leaves a bunch of things that we could show here, the progress
of the startup process being one of them.
--
Robert Haas
EDB: http://www.enterprisedb.com
On Tue, Apr 20, 2021 at 2:43 PM Robert Haas <robertmhaas@gmail.com> wrote:
On Mon, Apr 19, 2021 at 8:44 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Hmm ... +1 for progress messages in the log, but I'm suspicious about
the complexity-and-fragility-versus-value tradeoff for the other thing.All right, so it's nice to hear that nobody so far is opposed to the
log-based solution, and I think it's sensible to think about building
that one first and doing anything else later.But, if we did want to invent something to allow monitoring via libpq
even at this early stage, how would we make it work? Magnus pointed
out that we can hardly read pg_authid during crash recovery, which
means that accepting logins in the usual sense at that stage is not
feasible. But, what if we picked a fixed, hard-coded role name for
this? I would suggest pg_monitor, but that's already taken for
something else, so maybe pg_console or some better thing someone else
can suggest. Without a pg_authid row, you couldn't use password, md5,
or scram authentication, unless we provided some other place to store
the verifier, like a flatfile. I'm not sure we want to go there, but
that still leaves a lot of workable authentication methods.
Another option would be to keep this check entirely outside the scope
of normal roles, and just listen on a port (or unix socket) during
startup which basically just replies with the current status if you
connect to it. On Unix this could also make use of peer authentication
requiring you to be the same user as postgres for example.
I think Álvaro is right to see this kind of work as an extension of
pg_isready, but the problem with pg_isready is that we don't want to
expose a lot of information to the whole Internet, or however much of
it can reach the postgres port. But with this approach, you can lock
down access via pg_hba.conf, which means that it's OK to expose
information that we don't want to make available to everyone. I think
we're still limited to exposing what can be observed from shared
memory here, because the whole idea is to have something that can be
used even before consistency is reached, so we shouldn't really be
doing anything that would look at the contents of data files. But that
still leaves a bunch of things that we could show here, the progress
of the startup process being one of them.
Yeah, I think we should definitely limit this to local access, one way
or another. Realistically using pg_hba is going to require catalog
access, isn't it? And we can't just go ignore those rows in pg_hba
that for example references role membership (as well as all those auth
methods you can't use).
--
Magnus Hagander
Me: https://www.hagander.net/
Work: https://www.redpill-linpro.com/
On Tue, 20 Apr 2021 15:04:28 +0200
Magnus Hagander <magnus@hagander.net> wrote:
[...]
Yeah, I think we should definitely limit this to local access, one way
or another. Realistically using pg_hba is going to require catalog
access, isn't it? And we can't just go ignore those rows in pg_hba
that for example references role membership (as well as all those auth
methods you can't use).
Two another options:
1. if this is limited to local access only, outside of the log entries, the
status of the startup could be updated in the controldata file as well. This
would allows to watch it without tail-grep'ing logs using eg. pg_controldata.
2. maybe the startup process could ignore update_process_title? As far
as I understand the doc correctly, this GUC is mostly useful for backends on
Windows.
Regards,
On Tue, Apr 20, 2021 at 5:17 PM Jehan-Guillaume de Rorthais
<jgdr@dalibo.com> wrote:
On Tue, 20 Apr 2021 15:04:28 +0200
Magnus Hagander <magnus@hagander.net> wrote:
[...]Yeah, I think we should definitely limit this to local access, one way
or another. Realistically using pg_hba is going to require catalog
access, isn't it? And we can't just go ignore those rows in pg_hba
that for example references role membership (as well as all those auth
methods you can't use).Two another options:
1. if this is limited to local access only, outside of the log entries, the
status of the startup could be updated in the controldata file as well. This
would allows to watch it without tail-grep'ing logs using eg. pg_controldata.
I think doing so in controldata would definitely make things
complicated for no real reason. Plus controldata has a fixed size (and
has to have), whereas something like this would probably want more
variation than that makes easy.
There could be a "startup.status" file I guess which would basically
contain the last line of what would otherwise be in the log. But if it
remains a textfile, I'm not sure what the gain is -- you'll just have
to have the dba look in more places than one to find it? It's not like
there's likely to be much other data written to the log during these
times?
2. maybe the startup process could ignore update_process_title? As far
as I understand the doc correctly, this GUC is mostly useful for backends on
Windows.
You mention Windows -- that would be one excellent reason not to go
for this particular method. Viewing the process title is much harder
on Windows, as there is actually no such thing and we fake it.
--
Magnus Hagander
Me: https://www.hagander.net/
Work: https://www.redpill-linpro.com/
Magnus Hagander <magnus@hagander.net> writes:
On Tue, Apr 20, 2021 at 5:17 PM Jehan-Guillaume de Rorthais
<jgdr@dalibo.com> wrote:Two another options:
1. if this is limited to local access only, outside of the log entries, the
status of the startup could be updated in the controldata file as well. This
would allows to watch it without tail-grep'ing logs using eg. pg_controldata.
I think doing so in controldata would definitely make things
complicated for no real reason. Plus controldata has a fixed size (and
has to have), whereas something like this would probably want more
variation than that makes easy.
Also, given that pg_control is as critical a bit of data as we have,
we really don't want to be writing it more often than we absolutely
have to.
There could be a "startup.status" file I guess which would basically
contain the last line of what would otherwise be in the log. But if it
remains a textfile, I'm not sure what the gain is -- you'll just have
to have the dba look in more places than one to find it? It's not like
there's likely to be much other data written to the log during these
times?
Yeah, once you are talking about dumping stuff in a file, it's not
clear how that's better than progress-messages-in-the-log. People
already have a lot of tooling for looking at the postmaster log.
I think the point of Robert's other proposal is to allow remote
checks of the restart's progress, so local files aren't much of
a substitute anyway.
regards, tom lane
Greetings,
* Tom Lane (tgl@sss.pgh.pa.us) wrote:
Magnus Hagander <magnus@hagander.net> writes:
On Tue, Apr 20, 2021 at 5:17 PM Jehan-Guillaume de Rorthais
<jgdr@dalibo.com> wrote:Two another options:
1. if this is limited to local access only, outside of the log entries, the
status of the startup could be updated in the controldata file as well. This
would allows to watch it without tail-grep'ing logs using eg. pg_controldata.I think doing so in controldata would definitely make things
complicated for no real reason. Plus controldata has a fixed size (and
has to have), whereas something like this would probably want more
variation than that makes easy.Also, given that pg_control is as critical a bit of data as we have,
we really don't want to be writing it more often than we absolutely
have to.
Yeah, don't think pg_control fiddling is what we want. I do agree with
improving the logging situation around here, certainly.
There could be a "startup.status" file I guess which would basically
contain the last line of what would otherwise be in the log. But if it
remains a textfile, I'm not sure what the gain is -- you'll just have
to have the dba look in more places than one to find it? It's not like
there's likely to be much other data written to the log during these
times?Yeah, once you are talking about dumping stuff in a file, it's not
clear how that's better than progress-messages-in-the-log. People
already have a lot of tooling for looking at the postmaster log.
Agreed.
I think the point of Robert's other proposal is to allow remote
checks of the restart's progress, so local files aren't much of
a substitute anyway.
Yeah, being able to pick up on this remotely seems like it'd be quite
nice. I'm not really thrilled with the idea, but the best I've got
offhand for this would be a new role that's "pg_recovery_login" where an
admin can GRANT that role to the roles they'd like to be able to use to
login during the recovery process and then, for those roles, we write
out flat files to allow authentication without access to pg_authid,
whenever their password or such changes. It's certainly a bit grotty
but I do think it'd work. I definitely don't want to go back to having
all of pg_authid written as a flat file and I'd rather that existing
tools and libraries work with this (meaning using the same port and
speaking the PG protocol and such) rather than inventing some new thing
that listens on some other port, etc.
On the fence about tying this to 'pg_monitor' instead of using a new
predefined role. Either way, I would definitely prefer to see the admin
have to create a role and then GRANT the predefined role to that role.
I really dislike the idea of having predefined roles that can be used to
directly log into the database.
Thanks,
Stephen
Stephen Frost <sfrost@snowman.net> writes:
Yeah, being able to pick up on this remotely seems like it'd be quite
nice. I'm not really thrilled with the idea, but the best I've got
offhand for this would be a new role that's "pg_recovery_login" where an
admin can GRANT that role to the roles they'd like to be able to use to
login during the recovery process and then, for those roles, we write
out flat files to allow authentication without access to pg_authid,
We got rid of those flat files for good and sufficient reasons. I really
really don't want to go back to having such.
I wonder though whether we really need authentication here. pg_ping
already exposes whether the database is up, to anyone who can reach the
postmaster port at all. Would it be so horrible if the "can't accept
connections" error message included a detail about "recovery is X%
done"?
regards, tom lane
Greetings,
* Tom Lane (tgl@sss.pgh.pa.us) wrote:
Stephen Frost <sfrost@snowman.net> writes:
Yeah, being able to pick up on this remotely seems like it'd be quite
nice. I'm not really thrilled with the idea, but the best I've got
offhand for this would be a new role that's "pg_recovery_login" where an
admin can GRANT that role to the roles they'd like to be able to use to
login during the recovery process and then, for those roles, we write
out flat files to allow authentication without access to pg_authid,We got rid of those flat files for good and sufficient reasons. I really
really don't want to go back to having such.
Yeah, certainly is part of the reason that I didn't really like that
idea either.
I wonder though whether we really need authentication here. pg_ping
already exposes whether the database is up, to anyone who can reach the
postmaster port at all. Would it be so horrible if the "can't accept
connections" error message included a detail about "recovery is X%
done"?
Ultimately it seems like it would depend on exactly what we are thinking
of returning there. A simple percentage of recovery which has been
completed doesn't seem like it'd really be revealing too much
information though.
Thanks,
Stephen
Hi,
On 2021-04-19 13:55:13 -0400, Robert Haas wrote:
Another possible approach would be to accept connections for
monitoring purposes even during crash recovery. We can't allow access
to any database at that point, since the system might not be
consistent, but we could allow something like a replication connection
(the non-database-associated variant). Maybe it would be precisely a
replication connection and we'd just refuse all but a subset of
commands, or maybe it would be some other kinds of thing. But either
way you'd be able to issue a command in some mini-language saying "so,
tell me how startup is going" and it would reply with a result set of
some kind.
The hard part about this seems to be how to perform authentication -
obviously we can't do catalog lookups for users at that time.
If that weren't the issue, we could easily do much better than now, by
just providing an errdetail() with recovery progress information. But we
presumably don't want to spray such information to unauthenticated
connection attempts.
I've vaguely wondered before whether it'd be worth having something like
an "admin" socket somewhere in the data directory. Which explicitly
wouldn't require authentication, have the cluster owner as the user,
etc. That'd not just be useful for monitoring during recovery, but also
make some interactions with the server easier for admin tools I think.
If I had to pick one of these two ideas, I'd pick the one the
log-based solution, since it seems easier to access and simplifies
retrospective analysis, but I suspect SQL access would be quite useful
for some users too, especially in cloud environments where "just log
into the machine and have a look" is not an option.
However, leaving aside the implementation effort, the crazy idea
above would not easily address the issue of only being accessible with
local access...
Greetings,
Andres Freund
Hi,
On 2021-04-20 14:56:58 -0400, Tom Lane wrote:
I wonder though whether we really need authentication here. pg_ping
already exposes whether the database is up, to anyone who can reach the
postmaster port at all. Would it be so horrible if the "can't accept
connections" error message included a detail about "recovery is X%
done"?
Unfortunately I think something like a percentage is hard to calculate
right now. Even just looking at crash recovery (vs replication or
PITR), we don't currently know where the WAL ends without reading all
the WAL. The easiest thing to return would be something in LSNs or
bytes and I suspect that we don't want to expose either unauthenticated?
I wonder if we ought to occasionally update something like
ControlFileData->minRecoveryPoint on primaries, similar to what we do on
standbys? Then we could actually calculate a percentage, and it'd have
the added advantage of allowing to detect more cases where the end of
the WAL was lost. Obviously we'd have to throttle it somehow, to avoid
adding a lot of fsyncs, but that seems doable?
Greetings,
Andres Freund
On 2021-Apr-20, Andres Freund wrote:
On 2021-04-19 13:55:13 -0400, Robert Haas wrote:
Another possible approach would be to accept connections for
monitoring purposes even during crash recovery. We can't allow access
to any database at that point, since the system might not be
consistent, but we could allow something like a replication connection
(the non-database-associated variant). Maybe it would be precisely a
replication connection and we'd just refuse all but a subset of
commands, or maybe it would be some other kinds of thing. But either
way you'd be able to issue a command in some mini-language saying "so,
tell me how startup is going" and it would reply with a result set of
some kind.The hard part about this seems to be how to perform authentication -
obviously we can't do catalog lookups for users at that time.
Maybe a way to do this would involve some sort of monitoring cookie
that's obtained ahead of time (maybe at initdb time?) and is supplied to
the frontend by some OOB means. Then frontend can present that during
startup to the server, which ascertains its legitimacy without having to
access catalogs. Perhaps it even requires a specific pg_hba.conf rule.
--
�lvaro Herrera Valdivia, Chile
"La verdad no siempre es bonita, pero el hambre de ella s�"
+1 for both log messages and allowing connections. I believe these two
complement each other.
In the cloud world, we oftentimes want to monitor the progress of the
recovery without connecting to the server as the operators don't
necessarily have the required permissions to connect and query. Secondly,
having this information in the log helps going back in time and understand
where Postgres spent time during recovery.
The ability to query the server provides real time information and come
handy.
Thanks,
Satya
On Mon, Apr 19, 2021 at 10:55 AM Robert Haas <robertmhaas@gmail.com> wrote:
Show quoted text
Hi,
I've noticed that customers not infrequently complain that they start
postgres and then the system doesn't come up for a while and they have
no idea what's going on and are (understandably) worried. There are
probably a number of reasons why this can happen, but the ones that
seem to come up most often in my experience are (1) SyncDataDirectory
takes a long time, (b) ResetUnloggedRelations takes a long time, and
(c) there's a lot of WAL to apply so that takes a long time. It's
possible to distinguish this last case from the other two by looking
at the output of 'ps', but that's not super-convenient if your normal
method of access to the server is via libpq, and it only works if you
are monitoring it as it's happening rather than looking at the logs
after-the-fact. I am not sure there's any real way to distinguish the
other two cases without using strace or gdb or similar.It seems to me that we could do better. One approach would be to try
to issue a log message periodically - maybe once per minute, or some
configurable interval, e.g. perhaps add messages something like this:LOG: still syncing data directory, elapsed time %ld.%03d ms, current path
%s
LOG: data directory sync complete after %ld.%03d ms
LOG: still resetting unlogged relations, elapsed time %ld.%03d ms,
current path %s
LOG: unlogged relations reset after %ld.%03d ms
LOG: still performing crash recovery, elapsed time %ld.%03d ms,
current LSN %08X/%08XWe already have a message when redo is complete, so there's no need
for another one. The implementation here doesn't seem too hard either:
the startup process would set a timer, when the timer expires the
signal handler sets a flag, at a convenient point we notice the flag
is set and responding by printing a message and clearing the flag.Another possible approach would be to accept connections for
monitoring purposes even during crash recovery. We can't allow access
to any database at that point, since the system might not be
consistent, but we could allow something like a replication connection
(the non-database-associated variant). Maybe it would be precisely a
replication connection and we'd just refuse all but a subset of
commands, or maybe it would be some other kinds of thing. But either
way you'd be able to issue a command in some mini-language saying "so,
tell me how startup is going" and it would reply with a result set of
some kind.If I had to pick one of these two ideas, I'd pick the one the
log-based solution, since it seems easier to access and simplifies
retrospective analysis, but I suspect SQL access would be quite useful
for some users too, especially in cloud environments where "just log
into the machine and have a look" is not an option.Thoughts?
--
Robert Haas
EDB: http://www.enterprisedb.com
Greetings,
* Andres Freund (andres@anarazel.de) wrote:
On 2021-04-20 14:56:58 -0400, Tom Lane wrote:
I wonder though whether we really need authentication here. pg_ping
already exposes whether the database is up, to anyone who can reach the
postmaster port at all. Would it be so horrible if the "can't accept
connections" error message included a detail about "recovery is X%
done"?Unfortunately I think something like a percentage is hard to calculate
right now. Even just looking at crash recovery (vs replication or
PITR), we don't currently know where the WAL ends without reading all
the WAL. The easiest thing to return would be something in LSNs or
bytes and I suspect that we don't want to expose either unauthenticated?
While it obviously wouldn't be exactly accurate, I wonder if we couldn't
just look at the WAL files we have to reply and then guess that we'll go
through about half of them before we reach the end..? I mean, wouldn't
exactly be the first time that a percentage progress report wasn't
completely accurate. :)
I wonder if we ought to occasionally update something like
ControlFileData->minRecoveryPoint on primaries, similar to what we do on
standbys? Then we could actually calculate a percentage, and it'd have
the added advantage of allowing to detect more cases where the end of
the WAL was lost. Obviously we'd have to throttle it somehow, to avoid
adding a lot of fsyncs, but that seems doable?
This seems to go against Tom's concerns wrt rewriting pg_control.
Perhaps we could work through a solution to that, which would be nice,
but I'm not sure that we need the percentage to be super accurate
anyway, though, ideally, we'd work it out so that it's always increasing
and doesn't look "stuck" as long as we're actually moving forward
through the WAL.
Maybe a heuristic of 'look at the end of the WAL files, assume we'll go
through 50% of it, but only consider that to be 90%, with the last 10%
going from half-way through the WAL to the actual end of the WAL
available."
Yes, such heuristics are terrible, but they're also relatively simple
and wouldn't require tracking anything additional and would, maybe,
avoid the concern about needing to authenticate the user..
Thanks,
Stephen
Stephen Frost <sfrost@snowman.net> writes:
* Andres Freund (andres@anarazel.de) wrote:
On 2021-04-20 14:56:58 -0400, Tom Lane wrote:
I wonder though whether we really need authentication here. pg_ping
already exposes whether the database is up, to anyone who can reach the
postmaster port at all. Would it be so horrible if the "can't accept
connections" error message included a detail about "recovery is X%
done"?
Unfortunately I think something like a percentage is hard to calculate
right now.
While it obviously wouldn't be exactly accurate, I wonder if we couldn't
just look at the WAL files we have to reply and then guess that we'll go
through about half of them before we reach the end..? I mean, wouldn't
exactly be the first time that a percentage progress report wasn't
completely accurate. :)
Or we could skip all the guessing and just print something like what
the startup process exposes in ps status, ie "currently processing
WAL file so-and-so".
regards, tom lane